Simplepush Blog

Web scraping dynamic content created by Javascript with Python

Scraping websites which contain dynamic content created by Javascript sounds easier than it is. This is partly because browser technology is constantly evolving which forces web scraping libraries to change with them. Therefore many articles written about the topic reference deprecated libraries like PhantomJS and dryscrape which makes it difficult to find information that is up-to-date.

In this article we will show you how to scrape dynamic content with Python and Selenium in headless mode. Selenium is a web scraping library similar to BeautifulSoup with the difference that it can handle website content that was loaded from a Javascript script.

To make things more exciting we will do so by providing an example that has a real life use case. Namely sending a notification to your Android or iOS device when certain TeamSpeak users enter or leave a given TeamSpeak server.

First make sure to install Selenium and the Simplepush library.

sudo pip3 install selenium
sudo pip3 install simplepush

Then we need to make sure to have the ChromeDriver installed.

On Ubuntu or Raspbian:

sudo apt install chromium-chromedriver

On Debian:

sudo apt install chromium-driver

On MacOS:

brew cask install chromedriver

Now we can start coding.

#!/usr/bin/python3
from multiprocessing import Process, Manager
from requests.exceptions import ConnectionError
from selenium import webdriver
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from simplepush import send
import time

# Get this URL from the tsviewer.com search
TSVIEWER_URL = "https://www.tsviewer.com/index.php?page=ts_viewer&ID=1111111"
# If you squint, you can derive the TSVIEWER_ID from TSVIEWER_URL
TSVIEWER_ID = "ts3viewer_1111111"
# You will immediately get your personal Simplepush key after installing the Simplepush app
SIMPLEPUSH_KEY = "YourSimplepushKey"
# The usernames of your friends you want to be notified about
FRIENDS = ["TeamSpeakUser"]

def update(friends_online):
    driver = webdriver.Chrome(options=options)

    try:
        driver.get(TSVIEWER_URL)
        # Wait until Javascript loaded a div where the id is TSVIEWER_ID
        WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.ID, TSVIEWER_ID)))
        html = driver.page_source

        # This check unfortunately seems to be necessary since sometimes WebDriverWait doesn't do its job
        if TSVIEWER_ID in html:
            for friend in FRIENDS:
                send_notification_on_change(friend, f"{friend} entered the server", f"{friend} left the server", html, friends_online)
    except:
        print("Error")
    finally:
        driver.close()
        driver.quit()

def send_notification_on_change(name, message_join, message_leave, html, friends_online):
    name = name.lower()
    html = html.lower()

    if name in html and name not in friends_online:
        try:
            friends_online.append(name)
            send(SIMPLEPUSH_KEY, "A friend joined", message_join)
        except ConnectionError:
            friends_online.remove(name)

    if name in friends_online and name not in html:
        try:
            friends_online.remove(name)
            send(SIMPLEPUSH_KEY, "A friend left",  message_leave)
        except ConnectionError:
            friends_online.append(name)

if __name__ == "__main__":
    options = webdriver.ChromeOptions()
    options.set_headless = True
    options.add_argument('headless')

    manager = Manager()
    friends_online = manager.list()

    try:
        while(1):
            p = Process(target=update, args=[friends_online])
            p.start()
            # make the main process wait for `update` to end
            p.join()
            # all memory used by the subprocess will be freed to the OS
            time.sleep(5)
    except (KeyboardInterrupt, SystemExit):
        print("Stopped")

Did you notice how we use the multiprocessing library to start Selenium in its own process? This is because otherwise our program could run out of memory since Python has difficulties collecting unused WebDriver instances. By running them inside their own processes we make sure that all memory is released back to the OS once a process finishes.

Now if you run our little program, it will check tsviewer.com every five seconds to see if one of our friends joined or left the server (as defined by TSVIEWER_URL and TSVIEWER_ID). If that was the case, it will send out a notification to the Simplepush key defined by SIMPLEPUSH_KEY.

Support for critical alerts on iOS

The new version of our iPhone/iPad app adds support for Apple’s Critical Alerts.

Critical Alerts can bypass Do not Disturb mode and your mute switch. If you want to use Critical Alerts on your iOS device you can edit or create an event and activate Critical Alerts there. Also make sure to give Simplepush your permission for Critical Alerts if you want to use them.

Events screen with Critical Alerts turned on

Sending (end-to-end encrypted) push notifications from your Raspberry Pi Zero W

The Raspberry Pi Zero W is an impressively small device with a single core 1GHz CPU and 512MB of RAM. You can do all kind of great stuff with it. For me the most interesting use cases happen in a headless setup where you do not connect your Raspberry to any peripherals. See this great explanation on how to set up your Raspberry Pi Zero W without any monitor or keyboard.

When running your Raspberry Pi headlessly, you probably find yourself in the position where you want to get information from your Pi to yourself. This is where Simplepush comes into play.

With Simplepush a few lines of bash will make it possible to send push notifications to your Android device. The following example will send the amount of available memory on your Pi as a push notification to the user with the Simplepush key HuxgBB (replace this key with your own. You get your key by installing the Simplepush app - no registration required).

msg=$(cat /proc/meminfo | grep "MemAvailable:") && curl "https://api.simplepush.io/send/HuxgBB/Available memory on your Pi/$msg" &> /dev/null

With Simplepush this also works with end-to-end encryption:

sudo apt-get install git
git clone https://github.com/simplepush/send-encrypted.git
sudo cp send-encrypted/simplepush.sh /usr/local/bin/
sudo chmod +x /usr/local/bin/simplepush.sh
msg=$(cat /proc/meminfo | grep "MemAvailable:") && simplepush.sh -k HuxgBB -t "Available memory on your Pi" -m "$msg" -p yourpassword -s yoursalt

You can set your password in the encryption section of the app. This is also where you find your salt.

We also provide libraries for sending push notifications (normal and end-to-end encrypted) from within programming languages.

ESP8266 encrypted notifications with Simplepush

The ESP8266 is a low-cost system-on-a-chip (SoC) which comes with integrated WiFi and a full TCP/IP stack. It is the perfect match for prototyping IoT projects or for just tinkering around.

With Simplepush you can send push notifications to your smartphone.

In this blog post you will learn how to combine these two components in a secure way in which nobody can intercept your data. Neither at your local network nor at nodes (e.g. Google) on the way from your ESP8266 to your smartphone.

The chip is programmable (among other languages) in LUA, Javascript and with the Arduino environment (C++). For our task of sending encrypted notifications from the ESP8266 to a smartphone, we choose the Arduino environment.

Demo with a NodeMcu development board.

With Arduino libraries it is easy to do HTTPS requests from your ESP8266. However at the time of writing, doing SSL on the ESP8266 still comes with certain problems attached. For example you have to hard code the SHA1 fingerprint of the SSL certificate of the domain you want to connect to. This is a problem since certificates change and thereby their fingerprints change too.

When it comes to sending push notifications from the ESP8266 to your smartphone, we can sidestep this problem by using Simplepush’s encryption feature. This means sending encrypted notifications (AES-CBC-128) via HTTP.

Just download our library and import the ZIP file in the Arduino IDE (as described here). Then sending notifications (both encrypted and unencrypted) is as easy as shown in the following code snippet.

#include <Arduino.h>
#include <ESP8266WiFi.h>
#include <ESP8266WiFiMulti.h>
#include <Simplepush.h>

ESP8266WiFiMulti WiFiMulti;
Simplepush simple;

void setup() {
  // Connect ESP8266 to WiFi
  WiFiMulti.addAP("YourWifiSSID", "WifiPassword");

  if((WiFiMulti.run() == WL_CONNECTED)) {
    // Send unencrypted notification
    simple.send("YourSimplepushKey", "Wow", "This is so easy", "Event");

    // Send encrypted notification
    simple.sendEncrypted("YourSimplepushKey", "password", "salt", "Wow", "This is so secure.", "Event");
  }
}

For password and salt fill in the password you chose and the provided salt (can be found in the encryption section of the app).

Click here for a more complete example.

Imprint