Blog

Using Papa Parse and Highcharts to display real-time data from Amazon S3

If you read my previous post, you’ll know that I work on web applications that support industrial processes - my framework of choice is Laravel. Many of these processes involve data acquisition of manufacturing equipment. For the data acquisition we do, the standard rate of acquisition is every millisecond per channel. For example, if we’re collecting data at a rate of every millisecond for 10 channels, in a process that repeats every 10 seconds, we’re collecting 100,000 data points (10 * 1000 * 10).

Scenario

In this case, our client’s process required us to collect data on 8 channels - the process repeated every 7 seconds. They knew they wanted all of the data but wasn’t sure how they would store it and eventually ingest it so we had to store it in a way that they had options later. With that, they also wanted to see the detailed data real-time so it could be monitored by the production staff which meant we had to store it in a way that we could easily consume it on the web.

Our Solution

Backend

Knowing that practically all business intelligent tools and data warehouse solutions support importing CSV data, we settled on storing the data in Amazon S3 in CSV format. After the hardware collected the data, it sent the file to S3. I knew there were fabulous PHP packages, such as thephpleague’s csv, so I could get to the data pretty easily from the backend.

But the more I thought about it and, given the client didn’t have any requirements other than seeing the realtime data, I decided it wasn’t worth pushing all of this data through the backend. I knew the backend needed to be aware of the new CSV file on S3 but the contents of the file weren’t relevant.

Because we were already heavily relying on queues (Amazon SQS in this instance), I decided to add an S3 event that would invoke a Lambda function each time a file was added. When the Lambda function was invoked and received the event payload from S3, we simply pushed a message on to the queue that conformed to what Laravel was expecting (I discussed this in the previous post).

Basically, the message needs to have a job and data key and the job needs to be the fully-qualified name of the job in your Laravel application.

I had a job named App\Jobs\CsvAddedToS3 and we passed the path (S3 events refer to this as the prefix) in the data along with the stand that generated the file. Once the queue worker picked up the message and handed it to the job, using the path to CSV, the AWS SDK is used to generate a signed URL and an event is broadcasted (App\Events\S3CsvAdded)that includes the signed URL and the stand ID.

The event implements the ShouldBroadcast interface and the event is broadcasted on a channel specific to that stand. Laravel has excellent support for WebSocket communication so be sure to read the official documentation on Broadcasting. If you are new to WebSockets, just know that it provides a means for your backend to communicate with your frontend without your frontend having to periodically poll the backend.

Frontend

I’ve used both Redis and Pusher for WebSockets and, in this instance, I’m using Pusher but there is a relatively new package available from BeyondCode and Spatie that is a drop-in replacement for Pusher that I hope to use soon - it’s called Laravel WebSockets.

Laravel Echo is a JavaScript library that abstracts away the headache to subscribe to channels and listen for events from Laravel. It supports Pusher out of the box and, with Laravel WebSockets being a replacement for Pusher, it will work with it, too. For more, see the official Laravel Echo documentation here.

Once Laravel Echo is configured, you can listen on a channel for a certain event (note: I’m using Vue below):

Echo.channel(`stand.${this.stand.id}`)
    .listen('S3CsvAdded', (e) => {
        this.loadNewFile(e.url);
    });

This will listen on the channel for the specific stand we want to view realtime data for and, when the event is received, a loadNewFile method is called on the Vue component passing the signed URL provided by the backend.

Highcharts

There are a number of good charting libraries out there but, based on my needs, I’ve settled on Highcharts because it is so configurable and supports server-side rendering which comes in handy if you need to generate a chart on the backend during a background process. You can learn more about Highcharts on the Highcharts website.

For the realtime data the client would be viewing, a multi-series line chart fit the bill. When the chart is instantiated, I included all of the series with empty data arrays. The Highcharts API is simple and it’s easy to add data points where the chart is repainted or repainting can be deferred.

Papa Parse

In this project, each file averaged 56,000 data points. I wanted a simple way to download the file and do so without the problems associated with downloading a large file to the browser (slow, locking up the browser, etc.).

I found Papa Parse and couldn’t be happier with it. Check out the official documentation here. Here’s the code I used to download the file and update the chart:

Papa.parse(url, {
	download: true,
	fastMode: true,
	worker: true,
	dynamicTyping: true,
	skipEmptyLines: true,
	step: function (results) {
		results.data[0].forEach(function (item, i) {
			if (i > 0) { // first column is timestamp
				hchart.series[i - 1].addPoint(item, false);
			}
		})
	},
	complete: function () {
		hchart.redraw();
	}
});

Note: You’ll need logic to clear the series data each time a new file is added if you want to see only one cycle of data on the chart.

Here we’re streaming the file from S3 so the step function gets invoked with each data row. When adding the point to the chart series (FYI, the API may have changed in a newer Highcharts version), we’re passing false to the chart isn’t updated each time. Once the file downloaded has completed, the complete method is called where the chart is redrawn. If you’ve ever tried to stream a file from S3, you’ll know it’s not configured out of the box - there’s a little work to be done.

S3 CORS

To stream a file from S3, I had to add these headers:

<?xml version="1.0" encoding="UTF-8"?>
<CORSConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
<CORSRule>
    <AllowedOrigin>*</AllowedOrigin>
    <AllowedMethod>GET</AllowedMethod>
    <MaxAgeSeconds>3000</MaxAgeSeconds>
    <AllowedHeader>Content-Range</AllowedHeader>
    <AllowedHeader>*</AllowedHeader>
</CORSRule>
</CORSConfiguration>

Note: You’ll probably want to lock down the AllowedOrigin and AllowedHeader but this will give you an idea of what is needed.

Full Cycle

With this in place, here’s what happens:

  1. Manufacturing equipment runs a cycle
  2. Data is collected and saved in a CSV file on Amazon S3
  3. Once the file is saved on S3, a FilePut event is raised that triggers a Lambda function.
  4. The Lambda function takes the relevant data from the event payload and saves a message on Amazon SQS.
  5. The Laravel queue worker sees the message and hands it off to the proper job.
  6. The job generates a signed URL and passes it along with the stand ID to a Laravel event.
  7. The event payload is pushed through the WebSocket to the frontend.
  8. The frontend is listening on the channel for the event and, when it is received, the signed URL is passed to a method on the Vue component.
  9. Papa Parse downloads the file and the chart is updated.

Wrap up

I hope you found this helpful. If you have questions or there is something that is unclear, reach out to me on Twitter @ballenkidd