<!-- TITLE: Working with CSV data -->
In this tutorial we describe you how you can import your CSV data into a Phovea application and what the different types of CSV data are.
## Preperation
1. Create a new `data` directory in your phovea application directory
2. Create a new `my_phovea_app/data/index.json`, that will contain an array of metadata for the CSV files
Below we distinguish between different data types: [Table](#table), [Matrix](#matrix), [Vector](#vector), and [Stratification](#stratification).
## Table
A table contains multiple columns with different data types in one CSV file (e.g., user name, age, ...)
Place this `users.csv` in your data directory:
```csv
user_id, username, age
user_0, User A, 18
user_1, User B, 54
user_2, User C, 47
user_3, User D, 27
user_4, User E, 58
user_5, User F, 29
user_6, User G, 68
user_7, User H, 34
user_8, User I, 21
user_9, User J, 94
```
---
**Heads up!**
Phovea requires an `id` column as first column for this data type (i.e, `string` or `int`)!
---
Now we have to register this file in the `index.json` and add some metadata.
```json
[
{
"name": "User Data",
"description": "Some user attributes",
"path": "users.csv",
"separator": ",",
"type": "table",
"size": [10, 3],
"idtype": "Users",
"columns": [
{
"name": "username",
"value": {
"type": "string"
}
},
{
"name": "age",
"value": {
"type": "int",
"range": [0, 100]
}
}
]
}
]
```
For this example we assume that the `index.json` and the `users.csv` are stored in the same `data` directory. Otherwise you can adapt the path to the CSV file. Make sure to add an `idtype` and the `size` of the table. Each column contains of a name that is used for later reference, and a value type (i.e., `string`, `int`, `real`).
---
**Heads up!**
After changing the source data or the `index.json` you have to restart the Phovea server using `docker-compose restart api` from the workspace or project directory.
---
You can access the data now directly from the Phovea REST API.
**Dataset**
📕what about the TypeScript API? there is no need to use the REST API directly
* `/api/dataset/` returns the metadata of all available datasets including an automatically generated `id`
* `/api/dataset/<dataset_id>` and `/api/dataset/table/<dataset_id>/data` return the formatted data for the given dataset id
* `/api/dataset/table/<dataset_id>` returns the metadata for the given dataset id
* `/api/dataset/table/<dataset_id>/rows` returns a list of all row ids from the dataset
* `/api/dataset/table/<dataset_id>/rowIds` returns the ids in the Phovea range format (e.g., `(0:10)`)
* `/api/dataset/table/<dataset_id>/raw` returns the JSON data for the given dataset id
* `/api/dataset/table/<dataset_id>/col/<column_name>` returns the data for a column of the given dataset id
**Views**
* `/table/<dataset_id>/view/<view_name>` returns the metadata of the view
* `/table/<dataset_id>/view/<view_name>/raw` returns the JSON data for the given view of the dataset
* `/table/<dataset_id>/view/<view_name>/rows` returns a list of all row ids found for the view of the dataset
* `/table/<dataset_id>/view/<view_name>/rowIds` returns the ids in the Phovea range format (e.g., `(0:10)`) for the view of the dataset
**TODO** Explain how to define the view in the `index.json`.
## Matrix
In contrast to a [table](#table) all columns of a matrix have the same data type (e.g., `int` or `real`).
Place this `time-series.csv` in your data directory:
```csv
user_id, 2010, 2011, 2012, 2013, 2014, 2015
user_0, 18, 34, 57, 32, 25, 46
user_1, 95, 41, 15, 43, 82, 44
user_2, 57, 46, 37, 54, 25, 86
user_3, 34, 93, 68, 41, 54, 18
user_4, 68, 23, 32, 69, 12, 39
user_5, 34, 12, 49, 80, 11, 58
user_6, 21, 58, 30, 99, 68, 17
user_7, 84, 85, 60, 48, 48, 38
user_8, 71, 17, 48, 20, 60, 39
user_9, 72, 69, 23, 57, 53, 56
```
---
**Heads up!**
Phovea requires an `id` column as first column for this data type (i.e, `string` or `int`)!
---
Now we have to register this file in the `index.json` and add some metadata.
```json
[
{
"name": "Performance Time Series",
"description": "User performance over time",
"path": "time-series.csv",
"separator": ",",
"type": "matrix",
"size": [10, 6],
"rowtype": "Users",
"coltype": "Years",
"value": {
"type": "int",
"range": [0, 100]
}
}
]
```
You can access the data now directly from the Phovea REST API.
**Dataset**
* `/api/dataset/` returns the metadata of all available datasets including an automatically generated `id`
* `/api/dataset/<dataset_id>` and `/api/dataset/matrix/<dataset_id>/data` return the formatted data for the given dataset id
* `/api/dataset/matrix/<dataset_id>` returns the metadata for the given dataset id
* `/api/dataset/matrix/<dataset_id>/rows` returns a list of all row ids from the dataset
* `/api/dataset/matrix/<dataset_id>/rowIds` returns the ids in the Phovea range format (e.g., `(0:10)`)
* `/api/dataset/matrix/<dataset_id>/cols` returns a list of all column ids from the dataset
* `/api/dataset/matrix/<dataset_id>/colIds` returns the ids in the Phovea range format (e.g., `(0:10)`)
* `/api/dataset/matrix/<dataset_id>/raw` returns the JSON data for the given dataset id
* `/api/dataset/matrix/<dataset_id>/hist` returns a histogram for the matrix data
* `/api/dataset/matrix/<dataset_id>/stats` returns statistical values of the matrix data (e.g., q1, q3, min, max, sum, median, mean, skewness)
## Vector
TODO
## Stratification
TODO