Software Development

Tuning PostgreSQL with pgbench

When it comes to performance tuning an environment, often the first place to start is with the database. The reason for this is that most applications rely very heavily on a database of some sort.

Unfortunately, databases can be one of the most complex areas to tune. The reason I say that is because tuning a database service properly often involves tuning more than the database service itself; it often requires making hardware, OS, or even application modifications.

On top of requiring a diverse skill set, one of the biggest challenges with tuning a database is creating enough simulated database traffic to stress the database service. Which is why today’s article will explore pgbench, a benchmarking tool used to measure performance of a PostgreSQL instance.

PostgreSQL is a highly popular open-source relational database. One of the nice things about PostgreSQL is that there are quite a few tools that have been created to assist with the management of PostgreSQL; pgbench is one such tool.

While exploring pgbench, we will also use it to measure the performance gains/loss for a common PostgreSQL tunable.

Setting Up a PostgreSQL Instance

Before we can use pgbench to tune a database service, we must first stand up that database service. The below steps will outline how to set up a basic PostgreSQL instance on an Ubuntu 16.04 server.

Installing with apt-get

Installing PostgreSQL on an Ubuntu system is fairly easy. The bulk of the work is accomplished by simply running the apt-get command.

# apt-get install postgresql postgresql-contrib

The above apt-get command installs both the postgresql and postgresql-contrib packages. The postgresql package installs the base PostgreSQL service.

The postgresql-contrib package installs additional contributions to PostgreSQL. These contributions have not yet been added to the official package but often provide quite a bit of functionality.

With the packages installed, we now have a running PostgreSQL instance. We can verify this by using the systemctl command to check the status of PostgreSQL.

# systemctl status postgresql
● postgresql.service - PostgreSQL RDBMS
   Loaded: loaded (/lib/systemd/system/postgresql.service; enabled; vendor preset: enabled)
   Active: active (exited) since Mon 2017-01-02 21:14:36 UTC; 7h ago
  Process: 16075 ExecStart=/bin/true (code=exited, status=0/SUCCESS)
 Main PID: 16075 (code=exited, status=0/SUCCESS)

Jan 02 21:14:36 ubuntu-xenial systemd[1]: Starting PostgreSQL RDBMS...
Jan 02 21:14:36 ubuntu-xenial systemd[1]: Started PostgreSQL RDBMS.

The above indicates our instance started without any issues. We can now move on to our next step, creating a database.

Creating a database

When we installed the postgresql package, this package included the creation of a user named postgres. This user is used as the owner of the running instance. It also serves as the admin user for the PostgreSQL service.

In order to create a database, we will need to login to this user, which is accomplished by executing the su command.

# su - postgres

Once switched to the postgres user, we can log in to the running instance by using the PostgreSQL client, psql.

$ psql
psql (9.5.5)
Type "help" for help.

postgres=#

After executing the psql command, we were dropped into PostgreSQL’s command line environment. From here, we can issue SQL statements or use special client commands to perform actions.

As an example, we can list the current databases by issuing the \list command.

postgres-# \list
                                  List of databases
   Name    |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges   
-----------+----------+----------+-------------+-------------+-----------------------
 postgres  | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
 template0 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
 template1 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
(3 rows)

After issuing the \list command, three databases were returned. These are default databases that were set up during the initial installation process.

For our testing today, we will be creating a new database. Let’s go ahead and create that database, naming it example. We can do so by issuing the following SQL statement:

CREATE DATABASE example;

Once executed, we can validate that the database has been created by issuing the \list command again.

postgres=# \list
                                  List of databases
   Name    |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges   
-----------+----------+----------+-------------+-------------+-----------------------
 example   | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
 postgres  | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
 template0 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
 template1 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
(4 rows)

At this point, we now have an empty database named example. From this point, we will need to return to our bash shell to execute pgbench commands. We can do this by issuing the \q (quit) command.

postgres-# \q

Once logged out of the PostgreSQL command line environment, we can get started using pgbench to benchmark our database instance’s performance.

Using pgbench to Measure Performance

One of the most difficult things in measuring database performance is generating enough load. A popular option is to simply bombard test instances of the target application/s with test transactions. While this is a useful test that provides DB performance in relation to the application, it can be problematic sometimes as application bottlenecks can limit database testing.

For situations such as this, tools like pgbench come in handy. With pgbench, you can either use a sample database provided with pgbench or have pgbench run custom queries against an application database.

In this article, we will be using the example database that comes with pgbench.

Setting up the pgbench sample database

The set up of the sample database is quite easy and fairly quick. We can start this process by executing pgbench with the -i (initialize) option.

$ pgbench -i -s 50 example
creating tables...
5000000 of 5000000 tuples (100%) done (elapsed 5.33 s, remaining 0.00 s)
vacuum...
set primary keys...
done.

In the command above, we executed pgbench with the -i option and the -s option followed by the database name (example).

The -i (initialize) option will tell pgbench to initialize the database specified. What this means is that pgbench will create the following tables within the example database.

table                   # of rows
---------------------------------
pgbench_branches        1
pgbench_tellers         10
pgbench_accounts        100000
pgbench_history         0

By default, pgbench will create the tables above with the number of rows shown above. This creates a simple 16MB database.

Since we will be using pgbench to measure changes in performance, a small 16MB database will not be enough to stress our instance. This is where the -s (scaling) option comes into play.

The -s option is used to multiply the number of rows entered into each table. In the command above, we entered a “scaling” option of 50. This told pgbench to create a database with 50 times the default size.

What this means is our pgbench_accounts table now has 5,000,000 records. It also means our database size is now 800MB (50 x 16MB).

To verify that our tables have been created successfully, let’s go ahead and run the psql client again.

$ psql -d example
psql (9.5.5)
Type "help" for help.

example=#

In the command above, we used the -d (database) flag to tell psql to not only connect to the PostgreSQL service but to also switch to the example database.

Since we are currently using the example database, we can issue the \dt command to list the tables available within that database.

example=# \dt
              List of relations
 Schema |       Name       | Type  |  Owner   
--------+------------------+-------+----------
 public | pgbench_accounts | table | postgres
 public | pgbench_branches | table | postgres
 public | pgbench_history  | table | postgres
 public | pgbench_tellers  | table | postgres
(4 rows)

From the table above, we can see that pgbench created the four expected tables. This means our database is now populated and ready to be used to measure our database instance’s performance.

Establishing a baseline

When doing any sort of performance tuning, it is best to first establish a baseline performance. This baseline will serve as a measurement as to whether or not the changes you have performed have increased or decreased performance.

Let’s go ahead and call pgbench to establish the baseline for our “out of the box” PostgreSQL instance.

$ pgbench -c 10 -j 2 -t 10000 example
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 50
query mode: simple
number of clients: 10
number of threads: 2
number of transactions per client: 10000
number of transactions actually processed: 100000/100000
latency average: 4.176 ms
tps = 2394.718707 (including connections establishing)
tps = 2394.874350 (excluding connections establishing)

When calling pgbench, we add quite a few options to the command. The first is -c (clients), which is used to define the number of clients to connect with. For this testing, I used 10 to tell pgbench to execute with 10 clients.

What this means is that when pgbench is executing tests, it opens 10 different sessions.

The next option is the -j (threads) flag. This flag is used to define the number of worker processes for pgbench. In the above command, I specified the value of 2. This will tell pgbench to start two worker processes during the benchmarking.

The third option used is -t (transactions), which is used to specify the number of transactions to execute. In the command above, I provided the value of 10,000. However this doesn’t mean that only 10,000 transactions will be executed against our database service. What it means is that each client session will execute 10,000 transactions.

To summarize, the baseline test run was two pgbench worker processes simulating 10,000 transactions from 10 clients for a total of 100,000 transactions.

With that understanding, let’s take a look at the results of this first test.

$ pgbench -c 10 -j 2 -t 10000 example
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 50
query mode: simple
number of clients: 10
number of threads: 2
number of transactions per client: 10000
number of transactions actually processed: 100000/100000
latency average: 4.176 ms
tps = 2394.718707 (including connections establishing)
tps = 2394.874350 (excluding connections establishing)

The output of pgbench has quite a bit of information. Most of it describes the test scenarios being executed. The part that we are most interested in is the following:

tps = 2394.718707 (including connections establishing)
tps = 2394.874350 (excluding connections establishing)

From these results, it seems our baseline is 2,394 database transactions per second. Let’s go ahead and see if we can increase this number by modifying a simple configuration parameter within PostgreSQL.

Adding More Cache

One of the go-to parameters for anyone tuning PostgreSQL is the shared_buffers parameter. This parameter is used to specify the amount of memory the PostgreSQL service can utilize for caching. This caching mechanism is used to store the contents of tables and indexes in memory.

To show how we can use pgbench for performance tuning, we will be adjusting this value to test performance gains/losses.

By default, the shared_buffers value is set to 128MB, a fairly low value considering the amount of available memory on most servers today. We can see this setting for ourselves by looking at the contents of the /etc/postgresql/9.5/main/postgresql.conf file. Within this file, we should see the following.

# - Memory -

shared_buffers = 128MB                  # min 128kB
                                        # (change requires restart)

Let’s go ahead and switch this value to 256MB, effectively doubling our available cache space.

# - Memory -

shared_buffers = 256MB                  # min 128kB
                                        # (change requires restart)

Once completed, we will need to restart the PostgreSQL service. We can do so by executing the systemctl command with the restart option.

# systemctl restart postgresql

Once the service is fully up and running, we can once again use pgbench to measure our performance.

$ pgbench -c 10 -j 2 -t 10000 example
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 50
query mode: simple
number of clients: 10
number of threads: 2
number of transactions per client: 10000
number of transactions actually processed: 100000/100000
latency average: 3.921 ms
tps = 2550.313477 (including connections establishing)
tps = 2550.480149 (excluding connections establishing)

In our earlier baseline test, we were able to hit a rate of 2,394 transactions per second. In this last run, after updating the shared_buffers parameter, we were able to achieve 2,550 transactions per second, an increase of 156. While this is not a bad start, we can still go further.

While the shared_buffers parameter might start off at 128MB, the recommended value for this parameter is one-fourth the system memory. Our test system has 2GB of system memory, a value we can verify with the free command.

$ free -m
              total        used        free      shared  buff/cache   available
Mem:           2000          54         109         548        1836        1223
Swap:             0           0           0

In the output above, we can see that the total column shows a value of 2000MB on the row for memory. This column shows the total physical memory available to the system. We can also see in the available column that 1223MB is showing available. This means we have up to 1.2 GB of free memory we can use for our tuning purposes.

If we change our shared_buffers parameter to the recommended value of one-fourth system memory, we would need to change it to 512MB. Let’s go ahead and make this change and rerun our pgbench test.

# - Memory -

shared_buffers = 512MB                  # min 128kB
                                        # (change requires restart)

With the shared_buffers value updated in the /etc/postgresql/9.5/main/postgresql.conf, we can go ahead and restart the PostgreSQL service.

# systemctl restart postgresql

After restarting, let’s rerun our test.

$ pgbench -c 10 -j 2 -t 10000 example
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 50
query mode: simple
number of clients: 10
number of threads: 2
number of transactions per client: 10000
number of transactions actually processed: 100000/100000
latency average: 3.756 ms
tps = 2662.750932 (including connections establishing)
tps = 2663.066421 (excluding connections establishing)

This time, our system was able to reach 2,662 transactions per second, an additional increase of 112 transactions per second. Since our transactions per second increased by at least 100 both times, let’s go a step further and see what happens when changing this value to 1GB.

# - Memory -

shared_buffers = 1024MB                 # min 128kB
                                        # (change requires restart)

After updating the value, we will need to once again restart the PostgreSQL service.

# systemctl restart postgresql

With the service restarted, we can now rerun our test.

$ pgbench -c 10 -j 2 -t 10000 example
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 50
query mode: simple
number of clients: 10
number of threads: 2
number of transactions per client: 10000
number of transactions actually processed: 100000/100000
latency average: 3.744 ms
tps = 2670.791865 (including connections establishing)
tps = 2671.079076 (excluding connections establishing)

This time, our transactions per second went from 2,662 to 2,671 and increase of 9 transactions per second. This is a situation where we are hitting diminishing returns.

While it is feasible for many environments to increase the shared_buffers value beyond the one-fourth guideline, doing so does not return the same results for this test database.

Summary

Based on the results of our testing, we can see that changing the value of the shared_buffers from 128MB to 512MB on our test system resulted in a 268 transactions per second increase in performance. Based on our baseline results, that is a 10 percent increase in performance.

We did this all on a base PostgreSQL instance using pgbench‘s sample database. Meaning, we did not have to load our application to get a baseline metric on how well PostgreSQL performs.

While we were able to increase our throughput by modifying the shared_buffers parameter within PostgreSQL, there are many more tuning parameters available. For anyone looking to tune a PostgreSQL instance, I would highly recommend checking out PostgreSQL’s wiki.

Reference: Tuning PostgreSQL with pgbench from our JCG partner Ben Cane at the Codeship Blog blog.

Benjamin Cane

Benjamin Cane is a systems architect in the financial services industry. He writes about Linux systems administration on his blog and has recently published his first book, Red Hat Enterprise Linux Troubleshooting Guide
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Back to top button