That is, we don't have a single YYYYMMDDHHMIN partition.
When newly-arrived developers meet our Hive data, they almost inevitably make a series of mistakes, in a particular order. They start with something like this:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
SELECT event FROM my_table | |
WHERE create_date > '2016-11-15' AND create_date <= '2016-12-15' |
Then, with partitions in hand, they try this:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
SELECT event FROM my_table | |
WHERE YYYY=2016 AND ((MM=11 AND DD>=15) OR (MM=12 AND DD<15)); |
create_date
is outside the date range. Like an event we don't want because it has a too-early create_date
of 2016-11-14 23:59:59
but it slips into the YYYY=2016,MM=11,DD=15 partition due to time skew.They almost get it right by using both the date range and the partition range:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
SELECT event FROM my_table | |
WHERE create_date > '2016-11-15' AND create_date <= '2016-12-15' | |
AND YYYY=2016 AND ((MM=11 AND DD>=15) OR (MM=12 AND DD<15))); |
create_date
's that fall within the date range but are not in those partitions. Like an event with a create_date
of 2016-12-14 23:59:59
that slips into the unincluded YYYY=2016,MM=12,DD=15 partition.They need to add some extra "slop", like one hour, on each end of the partition range so that they're finding all the events within the date range.
They end up with what I believe to be the "correct" query:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
SELECT event FROM my_table | |
WHERE create_date > '2016-11-15' AND create_date <= '2016-12-15' | |
AND YYYY=2016 AND | |
( | |
(MM=11 AND DD=14 AND HH>=23) | |
OR (MM=11 AND DD>14) | |
OR (MM=12 AND DD<15) | |
OR (MM=12 AND DD=15 AND HH<01) | |
); |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
SELECT event FROM my_table | |
WHERE create_date > '2016-12-15' AND create_date <= '2017-01-15' | |
AND | |
( (YYYY=2016 AND MM=12 AND DD=14 AND HH>=23) | |
OR (YYYY=2016 AND MM=12 AND DD>14) | |
OR (YYYY=2017 AND MM=01 AND DD<15) | |
OR (YYYY=2017 AND MM=01 AND DD=15 AND HH<01) | |
) |