Get To Know Your Users
If you want to help your database users or just want to know were your database resources are going you should check the database system tables. If you get lucky there will be a table monitoring executed queries.
In Vertica this table is v_monitor.query_requests:
Column Name | Data Type | Description |
---|---|---|
NODE_NAME | VARCHAR | Name of the node that is reporting the requested information. |
USER_NAME | VARCHAR | Name of the user who issued the query at the time Vertica recorded the session. |
SESSION_ID | VARCHAR | Identifier for this session. |
This identifier is unique within the cluster at any point in time but can be reused when the session closes. | ||
REQUEST_ID | INTEGER | Unique identifier of the query request in the user session. |
TRANSACTION_ID | INTEGER | Identifier for the transaction within the session, if any; otherwise NULL. |
STATEMENT_ID | INTEGER | Unique numeric ID for the currently-running statement. NULL indicates that no statement is currently being processed. |
The combination of TRANSACTION_ID, STATEMENT_ID, and REQUEST_ID uniquely identifies a statement within a session. | ||
REQUEST_TYPE | VARCHAR | Type of the query request. Examples include, but are not limited to: |
- QUERY | ||
- DDL | ||
- LOAD | ||
- UTILITY | ||
- TRANSACTION | ||
- PREPARE | ||
- EXECUTE | ||
- SET | ||
- SHOW | ||
REQUEST | VARCHAR | Query statement. |
REQUEST_LABEL | VARCHAR | Label of the query, if available. |
SEARCH_PATH | VARCHAR | Contents of the search path. |
MEMORY_ACQUIRED_MB | FLOAT | Memory acquired by this query request in megabytes. |
SUCCESS | BOOLEAN | Value returned if the query successfully executed. |
ERROR_COUNT | INTEGER | Number of errors encountered in this query request (logged in ERROR_MESSAGES table). |
START_TIMESTAMP | TIMESTAMPTZ | Beginning of history interval. |
END_TIMESTAMP | TIMESTAMPTZ | End of history interval. |
REQUEST_DURATION | TIMESTAMPTZ | Length of time in days, hours, minutes, seconds, and milliseconds. |
REQUEST_DURATION_MS | INTEGER | Length of time the query ran in milliseconds. |
IS_EXECUTING | BOOLEAN | Distinguishes between actively-running (t) and completed (f) queries. |
For a limited time you get a lot of details out of it. However you don’t want to spend every day looking into it. Instead you need to summarize that information constantly.
Business Intelligence tools tend to generate complicated queries with sometimes long runtimes. Those you want to capture as a whole. For other users you are more interested in the global picture.
So we need 2 tables to store that information:
CREATE TABLE dba.queries ( request_time timestamp, user_name varchar(128), request varchar(64000), request_label varchar(128), count int, max_request_duration_ms int, max_request_duration_minutes int SET USING ((queries.max_request_duration_ms / 1000) / 60) ); CREATE TABLE dba.bi_queries ( request_time timestamp, user_name varchar(128), request_label varchar(128), request_duration_ms int, memory_acquired_mb float, request varchar(64000), transaction_id int, statement_id int, request_duration_minutes int SET USING ((bi_queries.request_duration_ms / 1000) / 60) );
Now all we need is a cron job querying the system table once a day and capturing the information:
INSERT INTO dba.queries(request_time, user_name, request, request_label, count, max_request_duration_ms) SELECT date_trunc('HOUR', start_timestamp) AS request_time, user_name, regexp_replace(regexp_replace(regexp_replace(regexp_replace(request::VARCHAR(20000), '([<>]=?) ?[^ ]*', '\1 ?', 1, 0, 'i'), '= ?[^ ]*', '= ?', 1, 0, 'i'), 'IN ?\(([^)]*\))', 'IN ? ', 1, 0, 'i'), 'BETWEEN ?[^ ]* ?AND ?[^ ]*', 'BETWEEN ? AND ?', 1, 0, 'i') AS request, request_label, COUNT(*) AS count, MAX(request_duration_ms) FROM v_monitor.query_requests WHERE date_trunc('DAY', start_timestamp) = date_trunc('DAY', timestampadd(DAY, -1, statement_timestamp())) AND request_type = 'QUERY' AND user_name NOT IN ('dbadmin', 'monitoring') GROUP BY 1, 2, 3, 4;
Here we use a regular expression to remove the parameters and replace them with a question mark. Then we group the queries per hour, count them and capture the maximum runtime for that kind of query. With this information you get to know which queries you should optimize first. You are even able to highlight how the usage of your database changes over time.
Again for the BI tools we want to see the full queries:
INSERT INTO dba.bi_queries(request_time, user_name, request_label, request_duration_ms, memory_acquired_mb, request, transaction_id, STATEMENT_ID) SELECT start_timestamp AS request_time, user_name, request_label, request_duration_ms, memory_acquired_mb, request::VARCHAR(20000), transaction_id, STATEMENT_ID FROM v_monitor.query_requests WHERE date_trunc('DAY', start_timestamp) = date_trunc('DAY', timestampadd(DAY, -1, statement_timestamp())) AND request_type = 'QUERY' AND user_name IN ('cognos', 'tableau') AND request ILIKE 'SELECT%' AND request_duration_ms > 1000;
Here we collect only queries with a runtime of more than 1 second.