Time-Series Data : Troubleshooting and caveats

Data Size Limits

Time-series data can grow more quickly than other data types. This is because it's size depends on three factors:

  • Number of records (rows) in your dataset - such as number of states, number of sensors, etc.
  • Number of indicators for each record - such as development indicators per country, or measurements per sensors (temperature, humidity, etc.)
  • Number of samples (time-keys) for each indicator - such as daily, monthly, or yearly measurements.

The data size depends on its data structure:

  1. # of Records (Rows)
    To have a clear, interactive visualization of time-series data in record view, you should have at most 200/250 records. This roughly translates to the number of potential countries and regions in most datasets. The map view can support up to 4,000 unique records/regions, when your data includes unique location information per each record, such as a country name, or lat-long information. If your data has more records, you can still use the list-view to view records in a sorted order, and see rankings over time. 
  2. # of Samples (Time-keys)
    We recommend having at most 30-40 samples (time-keys) for each indicator. If you are viewing data on a yearly basis, that would translate to about 40 years. If you view data on a monthly basis, 36-month period can be effectively visualized. 
  3. # of Indicators (Attributes)
    The number of indicators can also quickly increase your data size. For example, if you have 100 indicators for 200 countries over 40 years, you will have 800,000 individual values to explore! We recommend keeping your number of indicators below 100.

Suggested Strategies

Use boost when possible: When you have many indicators (time-series data attributes), we suggest that you use the boost feature to detect multiple time-series attributes at once, as processing each individually will take a little more time.

Use partial data load: If your data has many indicators and samples, resulting in hundreds of columns, the first data load will take a significant time, even if your total data size is not large. To work through such datasets more quickly, we recommend the following:

  1. Load data partially, only keeping 3-4 time-keys. This will remove most of your data columns.
  2. Make sure your time-series data only contains numeric values.
  3. Use boost feature to detect multiple time-series at once.
  4. Update your data source with all columns (replace the partial data)
  5. Reload the dashboard - Keshif will detect additional time-keys in your data, and the dashboard will load quickly.
Customized data loading pipelines: If you need to support a larger number of indicators, extracted from a larger database, please contact us and we can create customized solutions for incremental loading of indicators from larger databases.

Analysis Feature Limitations

Currently, you cannot zoom into the timeline, such as zoom into specific years.

Keshif also cannot aggregate multiple time keys points into a single key, such as aggregating monthly information into yearly trends (such as average temperature per year), or into total yearly immigration by summing up individual months. If you need this feature, please contact us, and we'll be happy to work with you.

Still need help? Contact Us Contact Us