In a previous post I described my project for stock market data , which synced stock price data to a local storage, PyStore (Swedish stock market data – pt. 1). Recently I rewrote the syncing. Since I wanted to learn more about SQL servers, I set up a PostgreSQL server with the TimescaleDB plugin on a Linux box I have running in a closet. Thus I here describe in a short post what I updated, the database schema, and the sync script.

Showcase of the candlestick plotting functionality of my stock dashboard.
Figure 1. Small showcase of the candlestick plotting functionality of my stock dashboard. The code and instructions for the dashboard will be posted shortly.

To make the script work you need to install the SQL software (instructions PostgreSQL and TimescaleDB). There are numerous alternatives on how to install them, e.g. .deb package, compile source, docker etc, just make sure you match the versions, i.e. that TimescaleDB has support for your chosen PostgreSQL version if you make a “normal” system install.

Once you have the database set up, create a user called stock_scraper if you want something else, remember to change in appropriate places below.

DB creation and schema

As mentioned before, first make sure you have a user called stock_scraper, this makes access control easier. Our script will access the database through this user. There are several ways, and more secure ways for access. My server runs locally on my LAN only, so I think it is enough.

Figure 2. The first time you connect to your postgreSQL server, you do it with the postgres admin user. The \l command lists all databases. Good user management is a good start for a secure system.

Next we create our database, after logging in as psql admin you create the database with

CREATE DATABASE stock_data;

Then we enter the database and add the TimescaleDB extension

\c stock_data; CREATE EXTENSION IF NOT EXISTS timescaledb CASCADE;

After this we can create the tables:

--SQL statements for a database schema stock analysis and modeling
-- in TimescaleDB
--Author: Magnus V. Persson

--Schema for ticker_list table
-- Relational table
DROP TABLE IF EXISTS "ticker_list";
CREATE TABLE "ticker_list"(
      ticker          VARCHAR (15) NOT NULL,
      ticker_intl     VARCHAR (15) PRIMARY KEY NOT NULL,
      name            TEXT,
      isin            TEXT,
      icb             TEXT,
      market          TEXT NOT NULL,
      );
ALTER TABLE ticker_list OWNER TO stock_scraper;

--Schema for ticker_prices table
--Timescale table
DROP TABLE IF EXISTS "ticker_prices";
CREATE TABLE "ticker_prices"(
      date               DATE NOT NULL,
      ticker_intl        VARCHAR (15) NOT NULL,
      open               DOUBLE PRECISION,
      low                DOUBLE PRECISION,
      high               DOUBLE PRECISION,
      close              DOUBLE PRECISION,
      volume             BIGINT,
      stock_splits       DOUBLE PRECISION,
      dividends          DOUBLE PRECISION,
      PRIMARY KEY (ticker_intl, date)
      );
ALTER TABLE ticker_prices OWNER TO stock_scraper;

--Timescale specific statements to create hypertables for better performance
SELECT create_hypertable('ticker_prices', 'date', 'ticker_intl','open', 3);

In the schema script, we also make sure that our user, stock_scraper is the owner of the tables, since we are creating them as psql root/admin.

Figure 3. As a small side note, once things are created you can run \c stock_data to connect to the database, and \dt to list the tables. Of course you can also run queries.

It is also possible to manage the system with pgAdmin (c.f. phpMyAdmin).

Installed Python packages

The packages you need are:

yfinance psycopg2 pandas holidays numpy sqlalchemy

They can all be installed from the Anaconda package manager ‘conda’ with these commands

conda install -c ranaroussi yfinance

conda install -c anaconda psycopg2

conda install numpy holidays pandas

conda install -c anaconda sqlalchemy

see here and here for info about the less common packages yfinance and psycopg2.

Stock market data syncing

Now, to sync stocks, I took some parts of the old script, namely the part where I fetch the list of stocks to get (from NASDAQ Sweden’s webpage). EDIT: The script for this has been updated on GitHub. The one problem with this approach is that it contains some non-swedish stocks, BUT they will not be synced, since it will not find them on Yahoo Finance.

The script and associated files can be found here: https://github.com/vilhelmp/stock_sync_2. I will just mention some of the details of the script, what some of the functions do and so on.

Configuration files

The config.ini and database.ini files are used to set up the connection to the TimescaleDB database and various aspects of checking for stocks. Config.ini looks like this.

[logging]
logpath=/PATH/TO/LOG/DIRECTORY

[stocks]
market_closes = 16
waitformarket = 30              
start_date = 2000-01-01

The entries are pretty self-explanatory. Enter the path for storing session logs. waitformarket is to make sure the data is updated on the server before we sync, it could probably be 0. start_date is so you can decide how far back you want to sync the data. All of these settings except log files does not really matter that much if you set up the script to sync automatically, say at night.

The second configuration file is database.ini, it should look something like this:

[postgresql]
host=192.168.0.100
database=STOCK_DB
user=DB_USER
password=UR_PASSWORD

It just contains the information to connect to you database. Put in the relevant information. This could of course be a database server anywhere that you have access to that have this kind of authentication method.

Files

The help_functions.py files simply contains functions to read configuration files, connect to the database, check last date synced, and what date to sync to (if market has not closed yet, sync until yesterday, unless its a weekend).

The file init_logging.py does exactly that; sets up logging.

To be able to sync the stocks we need the names of the stocks, just as in the first post about this. The file get_stock_lists.py takes care of this, run this first. It needs to be modified in a couple of places, the most important being the connection to the database on the line that says

create_engine('postgresql+psycopg2://PG_USER:PG_PASSWORD@PG_IP:5432/STOCK_DB')

Where each part is self-explanatory. Secondly, if the script fails to get the table from the webpage, it falls back to a local (downloaded) version. Just go to the various pages and save them to disk and put the path in the file where it says

basepath = '/PATH/WHERE/TO/SAVE/WEBPAGE'

That is it, run this after you have set up the database, but before you run the main script.

Lastly, sync_stock_prices.py is the script that does what we want, syncs the ticker prices. It first checks the database which stocks need syncing, and which date to sync from (last date synced). It then gets the data and updates the database. That is it, now you have synced stock market data. After you have gone through the syncing process you have an up to date stock data database. Sometimes you have to run the script several times, since Yahoo Finance uses rate limiting. After this you are ready to get data from you own database.

As mentioned before, the files can be found here: https://github.com/vilhelmp/stock_sync_2

Querying our DB from Python

So now we can populate the database, what about retrieving data? I will show an example.

import pandas as pd
import psycopg2

def load_data(schema, table, ticker):
    # First we set parameters for access details
    PGHOST = "IP/HOST OF YOUR DB-SERVER"
    PGDATABASE = "stock_data" # the name of our table 
    PGUSER = "stock_scraper" # your stock_scraper username
    PGPASSWORD = "YOUR PostgreSQL PASSWORD"

    # Set up a connection to the postgres server.
    conn_string = "host="+ PGHOST +" port="+ "5432" +" dbname="+ \
    PGDATABASE +" user=" + PGUSER \
    +" password="+ PGPASSWORD
    conn=psycopg2.connect(conn_string)

    # Create a cursor object
    cursor = conn.cursor()
    sql_command = f"SELECT * FROM {str(schema)}.{str(table)} WHERE ticker_intl=\'{ticker}\';"
    print (sql_command)

    # Load the data
    data = pd.read_sql(sql_command, conn)

    # and don't leave the connection open
    conn.close()

    return (data)

Now a simple df = load_data(public,ticker_prices,'ABB.ST') will fetch all stock prices for company ABB into a Pandas dataframe.

Just to mention, this means you will hard-code your access tokens to the server, which in general is not great. Put it in a configuration file that you read or something.

The next step is to make a dashboard, and I have!