Strongly Consistent Global Indexes for Phoenix

DataWorks Summit منتشر شده در تاریخ 1398/03/24

428 بار بازدید - 5 سال پیش - Without transactional tables, the global

Without transactional tables, the global indexes can get easily out of sync with their data tables in Phoenix. Transactional tables require a separate transaction manager, have some restrictions and performance penalties, are still in beta. This technical talk lays out a design to have strongly consistent global indexes without the need for an external transaction manager. In addition to having strongly consistent indexing, the proposed design aims to have minimal impact on read performance, minimal code changes, and significant operational simplification by eliminating index rebuilds. Our implementation of the design and initial performance testing has been very promising towards achieving these goals.

In Phoenix, global indexing is implemented using a separate table for each secondary index of a table. Updating a table with one or more global index requires updating multiple table regions likely distributed over multiple region servers. Translating a single table update operation into a multi-table write operation poses consistency issues as Phoenix does not provide a reliable multi-table update capability without using transactional tables.

Updating multiple tables (in our case, a data table and its indexes) atomically requires implementing a form of two-phase commit protocol, a transaction capability. In the general transactional update problem, there is no special relationship among the updates to be made over multiple tables within a transaction. In other words, one cannot derive the update for a table from the updates for the other tables within the same transaction. Therefore, usually the content of a transaction has to be logged on a durable media before committed to the individual tables in order to be able to recover from failures.

Achieving strongly consistent global indexing does not really require implementing a general-purpose transaction capability. The reason for this is that the update for an index table can always be extracted from the update for the data table. This means if a global index is corrupted, lost or becomes inconsistent with its data table, then it can be rebuilt from the data table. This observation allows us to come up with a solution that leverages this property and is optimized for the secondary indexing problem.

Another important observation is that HBase is a log-structured data store, that is, updates are never done in place. In these systems, writes are much faster when compared to in-place update systems because random writes are handled as fast as sequential writes. This allows us to add an extra write phase during updates without severely impacting the write performance, which simplifies the overall design.

The proposed design is significantly different from the current design such that the proposed design has an extra write phase, changes the order of operations, and maintains per row status on index tables. It updates a data table and its index tables using a three phase write approach. In the first phase, the index table rows are updated with the “unverified” status in parallel. The verify status is a per-row-status and stored in a column of every global index table for non-transactional tables. If updating any of the index tables for a given data table fails after a number of retries, a write failure is returned to the application. This means that some of the index tables are updated with unverified rows. However, this does not pose a consistency problem since data from unverified rows are never used for serving reads (i.e., SQL queries). These rows will be rebuilt from the data table or they will be deleted if the corresponding data table rows do not exist during read operations (i.e., using a read-repair technique).

In the second phase which happens if the first phase is successful, the data table is updated. If the data table update fails, then a write failure is returned to the application. Again, since the rows written to the index tables in the first phase are still unverified and data from unverified rows are never used, data table write failures (in the second write phase) do not lead to correctness issues as in the case of index table write failures in the first write phase.

In the third phase which happens if the second phase is successful, the index table rows are updated with the “verified” status or deleted. A failure during the third phase is simply ignored as such failures are recovered during read operations on the index tables. After the third write phase, the completion status is returned to the application.

In this talk, we will present the details of the proposed design, the proof of its correctness, and our performance tests and results.

5 سال پیش در تاریخ 1398/03/24 منتشر شده است.

428 بـار بازدید شده

... بیشتر