0

How can I speed up a query that's currently doing a full table scan to get the maximum value of a timestamp column?

I have a large table with a timestamp column. There is a TPump running all day with many sessions inserting and updating rows, serialized using the PI. I need to run the equivalent of:

SELECT MAX(ts) FROM TABLE1

many times a day, which does a full table scan. Is there any index that can be added or something that can be done to speed this up?

  • A normal NUSI will not be used by Teradata because it is not ordered.
  • A value-ordered NUSI cannot be created on a timestamp column.
  • I cannot use an aggregate join index that precomputes the max because the AGI will have a different PI from the table, causing the different TPump sessions to block each other when updating the singe AGI value.
  • I cannot reduce the number of TPump sessions to 1 to avoid the AGI blocking problem because the feed requires a high update rate.

The best solution I have up to now is to change the tpump to keep another table updated that keeps the max(ts) for large enough groups of rows, so that the number of groups is small enough to make the SELECT max(ts) run fast, but large enough to get enough parallelism in the TPump to not become a bottleneck for the TPump. The TPump would have to use a different serialization key appropriate for this table's PI.

I would prefer is something could be done inside Teradata without having to create more tables or change the Tpump.

flag

1 Answer

1

Haven't had the opportunity to work with TPump, but I am going to throw this out there anyway because it could be useful for any large table that you need to access by a date or timestamp value.

Could you create partitioned primary index and partition on the extracted date value from the timestamp? Then you would only be scanning the records in the single partition if you knew that the timestamp was always greater than the current day at midnight:

SELECT MAX(ts) FROM Table1 WHERE ts > CAST(CURRENT_DATE() AS TIMESTAMP);

That is making the assumption that you know the MAX(ts) will be some time during the day that you are running the query and that you can modify the SELECT statement as well.

If the timestamp is giving you a problem on the partitioned primary index you could always create a separate column that is the date portion of the timestamp record to partition on.

link|flag
The query probably has to be WHERE CAST(ts as DATE) >= CURRENT_DATE(), so that Teradata realizes it can use partition elimination. However, since the partitioning expression would not be part of the PI, partitioning might significantly affect the TPump performance. I have to check how many partitions would have data per each PI value. Thanks. – Carlos A. Ibarra Mar 15 at 15:16

Your Answer

Get an OpenID
or

Not the answer you're looking for? Browse other questions tagged or ask your own question.