I have a table with 1.4 billion records. The table structure is as follows:
CREATE TABLE text_page (
text VARCHAR(255),
page_id INT UNSIGNED
) ENGINE=MYISAM DEFAULT CHARSET=ascii
The requirement is to create an index over the column text
.
The table size is about 34G.
I have tried to create the index by the following statement:
ALTER TABLE text_page ADD KEY ix_text (text)
After 10 hours' waiting I finally give up this approach.
Is there any workable solution on this problem?
UPDATE: the table is unlikely to be updated or inserted or deleted. The reason why to create index on the column text
is because this kind of sql query would be frequently executed:
SELECT page_id FROM text_page WHERE text = ?
UPDATE: I have solved the problem by partitioning the table.
The table is partitioned into 40 pieces on column text
. Then creating index on the table takes about 1 hours to complete.
It seems that MySQL index creation becomes very slow when the table size becomes very big. And partitioning reduces the table into smaller trunks.
Could it be your system just is not up to the task? I do not use MySQL (SQL Server here), but I know the pain of indexing an 800 million entry table. Basically.... you need the right hardware for that (as in: lots of fast discs). I now use nearly a dozen Velociraptors and performance is great ;)
SQL Servers (not as MS SQL Server, but as database servers using SQL) live and die with disc access, and normal discs are just not up to the task of larger operations.
You might want to create an index on first (for instance, 10) characters of the text field.
From the Docs:
Indexes can be created that use only the leading part of column values, using col_name(length) syntax to specify an index prefix length:
I have solved the problem by partitioning the table.
The table is partitioned into 40 pieces on column
text
. Then creating index on the table takes about 1 hours to complete.It seems that MySQL index creation becomes very slow when the table size becomes very big. And partitioning reduces the table into smaller trunks.
Set the sort_buffer_size to 4GB (or however much you can depending on how much memory you have).
Right now the create index is doing a sort but since you have a 32MB sort_buffer_size, it is basically thrashing the hard drive needlessly.
If you do not need to make queries like:
I would suggest creating a new hash column and index the table by the column. The overal size of the table + index might be much smaller.
UPD: By the way, 1.4 billion primary key integers occupy about 6 GB, that is the average length of the string is less than 30 characters, that is indexing on a prefix might be more preferable.
You should also take a look at MERGE storage engine.
One way to do this is to create a new table with the index set and copy the data to the new table.
Also, make sure you have enough temp space.
In case you're still wondering how to do this best, I would suggest you to use an online alter table tool.
There are lot of them on the internet, one of the famous ones are:
http://www.percona.com/doc/percona-toolkit/2.2/pt-online-schema-change.html
We have the same issues with big tables (more then 500mil records) and the altering goes perfect. It creates a new tmp table, adds trigger on the original table (for the new update/delete/insert records) and in the mean time it copies all the records to the new table (with the new structure)
Good luck!