In a nutshell how to connect DynamoDB to AWS Quicksight, considering these criteria?
In my existing solution, data arrives at AWS IoT, invokes a rule which sends it to Lambda. Lambda decodes it and puts it into DynamoDB. Each record in DynamoDB actually contains several readings/observations for the resource. So the data is not 'tidy' from a statistician's perspective. My desired outcome is to get the data into Quicksight and to be able to perform the same depth of analysis that I could in R, but with the massive benefits of automation, a pretty dashboard and worldwide distribution. This question is specifically about how to get the data to Quicksight (which has no import available at present for noSQL databases), and where the data cleaning should be done.
I have tried:
- Streaming the data through Lambda (with a trigger on new records), unrolling in Lambda and forwarding to S3 using Kinesis Firehose. From there, you can connect Athena to the dataset and import that into Quicksight. But I loose existing records. And it seems messy to me.
- I have read some blogs suggesting the use of AWS Glue and Or Redshift for a DataLake. And also some suggesting that I migrate the data into a Relational database first. Many suggestions that would take weeks to test out.
- Take a snap shot of the entire database on a weekly basis and just plonk that into S3. This seems wasteful and not inherently scale-able? But I might be underestimating the capability of AWS.
- Instead of using the services, it is possible to schedule a Lmabda to run weekly and scan the whole table and pull it apart to create a massive csv file and put it in S3. It might be a crazy approach, but it's a simple way to take care of all data cleaning and even cross with other datasets up front. for instance, imagine a customer details data set and a customer purchase history dataset, which will be much larger. It might be nice to merge them together in Lambda to remove the need for complex data set joining. Or it might be a very bad idea. I'm really not sure. Hence the question.
On data cleaning, to complicate matters Im interested in 3 databases. I need to join them. Easy as, in R. But seemingly very difficult in this scenario.