Working with CSV data

In this tutorial we describe you how you can import your CSV data into a Phovea application and what the different types of CSV data are.

Preperation

  1. Create a new data directory in your phovea application directory
  2. Create a new my_phovea_app/data/index.json, that will contain an array of metadata for the CSV files

Below we distinguish between different data types: Table, Matrix, Vector, and Stratification.

Table

A table contains multiple columns with different data types in one CSV file (e.g., user name, age, ...)

Place this users.csv in your data directory:

user_id, username, age
user_0, User A, 18
user_1, User B, 54
user_2, User C, 47
user_3, User D, 27
user_4, User E, 58
user_5, User F, 29
user_6, User G, 68
user_7, User H, 34
user_8, User I, 21
user_9, User J, 94

Heads up!

Phovea requires an id column as first column for this data type (i.e, string or int)!


Now we have to register this file in the index.json and add some metadata.

[
  {
    "name": "User Data",
    "description": "Some user attributes",
    "path": "users.csv",
    "separator": ",",
    "type": "table",
    "size": [10, 3],
    "idtype": "Users",
    "columns": [
      {
        "name": "username",
        "value": {
          "type": "string"
        }
      },
      {
        "name": "age",
        "value": {
          "type": "int",
          "range": [0, 100]
        }
      }
    ]
  }
]

For this example we assume that the index.json and the users.csv are stored in the same data directory. Otherwise you can adapt the path to the CSV file. Make sure to add an idtype and the size of the table. Each column contains of a name that is used for later reference, and a value type (i.e., string, int, real).


Heads up!

After changing the source data or the index.json you have to restart the Phovea server using docker-compose restart api from the workspace or project directory.


You can access the data now directly from the Phovea REST API.

Dataset

📕what about the TypeScript API? there is no need to use the REST API directly

  • /api/dataset/ returns the metadata of all available datasets including an automatically generated id
  • /api/dataset/<dataset_id> and /api/dataset/table/<dataset_id>/data return the formatted data for the given dataset id
  • /api/dataset/table/<dataset_id> returns the metadata for the given dataset id
  • /api/dataset/table/<dataset_id>/rows returns a list of all row ids from the dataset
  • /api/dataset/table/<dataset_id>/rowIds returns the ids in the Phovea range format (e.g., (0:10))
  • /api/dataset/table/<dataset_id>/raw returns the JSON data for the given dataset id
  • /api/dataset/table/<dataset_id>/col/<column_name> returns the data for a column of the given dataset id

Views

  • /table/<dataset_id>/view/<view_name> returns the metadata of the view
  • /table/<dataset_id>/view/<view_name>/raw returns the JSON data for the given view of the dataset
  • /table/<dataset_id>/view/<view_name>/rows returns a list of all row ids found for the view of the dataset
  • /table/<dataset_id>/view/<view_name>/rowIds returns the ids in the Phovea range format (e.g., (0:10)) for the view of the dataset

TODO Explain how to define the view in the index.json.

Matrix

In contrast to a table all columns of a matrix have the same data type (e.g., int or real).

Place this time-series.csv in your data directory:

user_id, 2010, 2011, 2012, 2013, 2014, 2015
user_0, 18, 34, 57, 32, 25, 46
user_1, 95, 41, 15, 43, 82, 44
user_2, 57, 46, 37, 54, 25, 86
user_3, 34, 93, 68, 41, 54, 18
user_4, 68, 23, 32, 69, 12, 39
user_5, 34, 12, 49, 80, 11, 58
user_6, 21, 58, 30, 99, 68, 17
user_7, 84, 85, 60, 48, 48, 38
user_8, 71, 17, 48, 20, 60, 39
user_9, 72, 69, 23, 57, 53, 56

Heads up!

Phovea requires an id column as first column for this data type (i.e, string or int)!


Now we have to register this file in the index.json and add some metadata.

[
  {
    "name": "Performance Time Series",
    "description": "User performance over time",
    "path": "time-series.csv",
    "separator": ",",
    "type": "matrix",
    "size": [10, 6],
    "rowtype": "Users",
    "coltype": "Years",
    "value": {
      "type": "int",
      "range": [0, 100]
    }
  }
]

You can access the data now directly from the Phovea REST API.

Dataset

  • /api/dataset/ returns the metadata of all available datasets including an automatically generated id
  • /api/dataset/<dataset_id> and /api/dataset/matrix/<dataset_id>/data return the formatted data for the given dataset id
  • /api/dataset/matrix/<dataset_id> returns the metadata for the given dataset id
  • /api/dataset/matrix/<dataset_id>/rows returns a list of all row ids from the dataset
  • /api/dataset/matrix/<dataset_id>/rowIds returns the ids in the Phovea range format (e.g., (0:10))
  • /api/dataset/matrix/<dataset_id>/cols returns a list of all column ids from the dataset
  • /api/dataset/matrix/<dataset_id>/colIds returns the ids in the Phovea range format (e.g., (0:10))
  • /api/dataset/matrix/<dataset_id>/raw returns the JSON data for the given dataset id
  • /api/dataset/matrix/<dataset_id>/hist returns a histogram for the matrix data
  • /api/dataset/matrix/<dataset_id>/stats returns statistical values of the matrix data (e.g., q1, q3, min, max, sum, median, mean, skewness)

Vector

TODO

Stratification

TODO