In a previous post I described my project for stock market data , which synced stock price data to a local storage, PyStore (Swedish stock market data – pt. 1). Recently I rewrote the syncing. Since I wanted to learn more about SQL servers, I set up a PostgreSQL server with the TimescaleDB plugin on a Linux box I have running in a closet. Thus I here describe in a short post what I updated, the database schema, and the sync script.

To make the script work you need to install the SQL software (instructions PostgreSQL and TimescaleDB). There are numerous alternatives on how to install them, e.g. .deb package, compile source, docker etc, just make sure you match the versions, i.e. that TimescaleDB has support for your chosen PostgreSQL version if you make a “normal” system install.
Once you have the database set up, create a user called stock_scraper
if you want something else, remember to change in appropriate places below.
DB creation and schema
As mentioned before, first make sure you have a user called stock_scraper
, this makes access control easier. Our script will access the database through this user. There are several ways, and more secure ways for access. My server runs locally on my LAN only, so I think it is enough.

\l
command lists all databases. Good user management is a good start for a secure system.Next we create our database, after logging in as psql admin you create the database with
CREATE DATABASE stock_data;
Then we enter the database and add the TimescaleDB extension
\c stock_data; CREATE EXTENSION IF NOT EXISTS timescaledb CASCADE;
After this we can create the tables:
--SQL statements for a database schema stock analysis and modeling -- in TimescaleDB --Author: Magnus V. Persson --Schema for ticker_list table -- Relational table DROP TABLE IF EXISTS "ticker_list"; CREATE TABLE "ticker_list"( ticker VARCHAR (15) NOT NULL, ticker_intl VARCHAR (15) PRIMARY KEY NOT NULL, name TEXT, isin TEXT, icb TEXT, market TEXT NOT NULL, ); ALTER TABLE ticker_list OWNER TO stock_scraper; --Schema for ticker_prices table --Timescale table DROP TABLE IF EXISTS "ticker_prices"; CREATE TABLE "ticker_prices"( date DATE NOT NULL, ticker_intl VARCHAR (15) NOT NULL, open DOUBLE PRECISION, low DOUBLE PRECISION, high DOUBLE PRECISION, close DOUBLE PRECISION, volume BIGINT, stock_splits DOUBLE PRECISION, dividends DOUBLE PRECISION, PRIMARY KEY (ticker_intl, date) ); ALTER TABLE ticker_prices OWNER TO stock_scraper; --Timescale specific statements to create hypertables for better performance SELECT create_hypertable('ticker_prices', 'date', 'ticker_intl','open', 3);
In the schema script, we also make sure that our user, stock_scraper
is the owner of the tables, since we are creating them as psql root/admin.

\c stock_data
to connect to the database, and \dt
to list the tables. Of course you can also run queries.It is also possible to manage the system with pgAdmin (c.f. phpMyAdmin).
Installed Python packages
The packages you need are:
yfinance psycopg2 pandas holidays numpy sqlalchemy
They can all be installed from the Anaconda package manager ‘conda’ with these commands
conda install -c ranaroussi yfinance
conda install -c anaconda psycopg2
conda install numpy holidays pandas
conda install -c anaconda sqlalchemy
see here and here for info about the less common packages yfinance
and psycopg2
.
Stock market data syncing
Now, to sync stocks, I took some parts of the old script, namely the part where I fetch the list of stocks to get (from NASDAQ Sweden’s webpage). EDIT: The script for this has been updated on GitHub. The one problem with this approach is that it contains some non-swedish stocks, BUT they will not be synced, since it will not find them on Yahoo Finance.
The script and associated files can be found here: https://github.com/vilhelmp/stock_sync_2. I will just mention some of the details of the script, what some of the functions do and so on.
Configuration files
The config.ini
and database.ini
files are used to set up the connection to the TimescaleDB database and various aspects of checking for stocks. Config.ini looks like this.
[logging] logpath=/PATH/TO/LOG/DIRECTORY [stocks] market_closes = 16 waitformarket = 30 start_date = 2000-01-01
The entries are pretty self-explanatory. Enter the path for storing session logs. waitformarket
is to make sure the data is updated on the server before we sync, it could probably be 0. start_date
is so you can decide how far back you want to sync the data. All of these settings except log files does not really matter that much if you set up the script to sync automatically, say at night.
The second configuration file is database.ini
, it should look something like this:
[postgresql] host=192.168.0.100 database=STOCK_DB user=DB_USER password=UR_PASSWORD
It just contains the information to connect to you database. Put in the relevant information. This could of course be a database server anywhere that you have access to that have this kind of authentication method.
Files
The help_functions.py
files simply contains functions to read configuration files, connect to the database, check last date synced, and what date to sync to (if market has not closed yet, sync until yesterday, unless its a weekend).
The file init_logging.py
does exactly that; sets up logging.
To be able to sync the stocks we need the names of the stocks, just as in the first post about this. The file get_stock_lists.py
takes care of this, run this first. It needs to be modified in a couple of places, the most important being the connection to the database on the line that says
create_engine('postgresql+psycopg2://PG_USER:PG_PASSWORD@PG_IP:5432/STOCK_DB')
Where each part is self-explanatory. Secondly, if the script fails to get the table from the webpage, it falls back to a local (downloaded) version. Just go to the various pages and save them to disk and put the path in the file where it says
basepath = '/PATH/WHERE/TO/SAVE/WEBPAGE'
That is it, run this after you have set up the database, but before you run the main script.
Lastly, sync_stock_prices.py
is the script that does what we want, syncs the ticker prices. It first checks the database which stocks need syncing, and which date to sync from (last date synced). It then gets the data and updates the database. That is it, now you have synced stock market data. After you have gone through the syncing process you have an up to date stock data database. Sometimes you have to run the script several times, since Yahoo Finance uses rate limiting. After this you are ready to get data from you own database.
As mentioned before, the files can be found here: https://github.com/vilhelmp/stock_sync_2
Querying our DB from Python
So now we can populate the database, what about retrieving data? I will show an example.
import pandas as pd import psycopg2 def load_data(schema, table, ticker): # First we set parameters for access details PGHOST = "IP/HOST OF YOUR DB-SERVER" PGDATABASE = "stock_data" # the name of our table PGUSER = "stock_scraper" # your stock_scraper username PGPASSWORD = "YOUR PostgreSQL PASSWORD" # Set up a connection to the postgres server. conn_string = "host="+ PGHOST +" port="+ "5432" +" dbname="+ \ PGDATABASE +" user=" + PGUSER \ +" password="+ PGPASSWORD conn=psycopg2.connect(conn_string) # Create a cursor object cursor = conn.cursor() sql_command = f"SELECT * FROM {str(schema)}.{str(table)} WHERE ticker_intl=\'{ticker}\';" print (sql_command) # Load the data data = pd.read_sql(sql_command, conn) # and don't leave the connection open conn.close() return (data)
Now a simple df = load_data(public,ticker_prices,'ABB.ST')
will fetch all stock prices for company ABB into a Pandas dataframe.
Just to mention, this means you will hard-code your access tokens to the server, which in general is not great. Put it in a configuration file that you read or something.
The next step is to make a dashboard, and I have!
Interesting project. I for sure want to hear more!
Thanks, if I only had more time!