Sorry I am newbie to Elasticsearch and I am using elasticsearch-river-jdbc to connect to myql database,everything is working fine except for the fact that everytime it run as per schedule its inserting duplicate record.This is what I am using
curl -XPUT 'localhost:9200/_river/my_jdbc_river/_meta' -d '{
"type" : "jdbc",
"schedule" : "0 0-59 0-23 ? * *",
"jdbc" : {
"url" : "jdbc:mysql://localhost:3306/test",
"user" : "test",
"password" : "test",
"sql" : "select * from test"
}
}'
I go through some docs where it's mentioned that we can run sql query select based on _id but my question is when we created this river only then this unique id is created and that is created on the Elasticsearch side so as per my understanding mysql has no knowledge about this.Please let me know if I am missing something
So if I am writing sql satement like this
"sql" : "select id as _id,a1,a2 from test"
[2015-03-10 13:16:00,018][ERROR][river.jdbc.RiverPipeline ] com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Unknown column 'id' in 'field list'
java.io.IOException: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Unknown column 'id' in 'field list'
Workaround for this issue is,I need to SELECT one of the fields as '_id' for it to work
Now the other issue is when its writing data back to ES the data and time format changed to UTC
There is already thread related to this but no workaround
https://stackoverflow.com/questions/12969481/jprante-elasticsearch-jdbc-river-changing-the-date-value
Solution for this issue is to use timezone in jdbc block
Also I am saving date and time in separate field in mysql DB
Elasticsearch uses Joda timeformat to save date. Hence it's automatically converting my date to datetime.
In the date field, since I don't have time, it is automatically adding zero's to it.
Since I need to display data via Kibana that why I need this split..I converted format of date and time as varchar(20) as a workaround(bad idea I know) and its working fine now ..