Recommended query practices
In addition to using the dashboards supplied by TrafficPeak, you can write your own queries for deeper knowledge about the performance of your TrafficPeak services.
Keep in mind that querying TrafficPeak data demands extra care due to the sometimes enormous amounts of data involved. Writing queries for multi-billion row data sets requires a change of mindset.
At this scale, it’s easy to get into a situation where queries are slow and time out, possibly affecting other users in your organization. Here are some tips to help produce the best results when you’re writing your own queries.
General advice
- Always limit a query’s output by using time constraints. Queries unbounded by time will surprise you with the amount of data returned and the processing required.
- Keep your search narrow by limiting the number of columns in your query to just what you need. Don’t use SELECT *. TrafficPeak is a column-oriented database. It’s different from a row-based database. Data in adjacent columns isn't stored in an adjacent manner, possibly requiring an undue amount of work to retrieve data you don’t need.
- Consider creating a Summary Table. At this scale, even a 30% reduction in compute, memory, and storage can be significant. Many summaries will save 95% of resources while still presenting the same data.
Specific tips
- Be careful with LIMIT. LIMIT is a valuable tool, especially when you’re debugging a new query for syntax and other problems. However, don’t rely on it to protect you from mistakes on a large dataset, even though the TrafficPeak database has some safeguards in place for this. Don’t scan 100 billion rows to return a LIMIT of 50 rows.
- LIMIT your results before using ORDER and GROUP. Ordering 100 billion rows is going to take a long time and use lots of memory. In the same vein, GROUPing by billions of high-cardinality fields is an invitation to failure.
- Use predicates. These are statements in your
WHERE
,HAVING
, andFROM
clauses that evaluate toTRUE
orFALSE
. Predicates are your greatest path to simple, fast queries, so use them whenever you can. TrafficPeak will push predicate knowledge to the lowest levels of the system, saving you time and money. - Limit JOINs. JOINs are expensive at this scale. If you must use JOIN, make sure you first omit all the columns you don’t need. There are often better ways to accomplish what you need without using a complicated JOIN statement.
- Watch your Common Table Expressions (CTE). CTEs are still queries under the hood, so be aware of them and make sure they obey these guidelines, too.
Grafana dashboards
- Don’t pipe data from SELECT into a Grafana widget. Select just the columns you need.
- Keep the count of your result sets as low as possible while still showing what you need to see. Grafana will only show a limited number of data points per widget in the low thousands.
- Make sure your dashboard filters are efficiently written. Dashboard filters run every time the page is loaded, so they can slow down the entire dashboard.
- Be aware of Grafana’s lazy loading and use it to your advantage. If there’s a widget that’s expensive to generate and is rarely used, put it below the fold so those queries are only executed when necessary.
- Use Summary Tables to provide a separate dashboard for management and stakeholders. Save the unsummarized, low-latency, full-fidelity dashboard for you and your team. This helps prevent query slowness based on sudden sharing of dashboards among many people.
Updated 13 days ago