<!-- TITLE: Working with CSV data --> In this tutorial we describe you how you can import your CSV data into a Phovea application and what the different types of CSV data are. ## Preperation 1. Create a new `data` directory in your phovea application directory 2. Create a new `my_phovea_app/data/index.json`, that will contain an array of metadata for the CSV files Below we distinguish between different data types: [Table](#table), [Matrix](#matrix), [Vector](#vector), and [Stratification](#stratification). ## Table A table contains multiple columns with different data types in one CSV file (e.g., user name, age, ...) Place this `users.csv` in your data directory: ```csv user_id, username, age user_0, User A, 18 user_1, User B, 54 user_2, User C, 47 user_3, User D, 27 user_4, User E, 58 user_5, User F, 29 user_6, User G, 68 user_7, User H, 34 user_8, User I, 21 user_9, User J, 94 ``` --- **Heads up!** Phovea requires an `id` column as first column for this data type (i.e, `string` or `int`)! --- Now we have to register this file in the `index.json` and add some metadata. ```json [ { "name": "User Data", "description": "Some user attributes", "path": "users.csv", "separator": ",", "type": "table", "size": [10, 3], "idtype": "Users", "columns": [ { "name": "username", "value": { "type": "string" } }, { "name": "age", "value": { "type": "int", "range": [0, 100] } } ] } ] ``` For this example we assume that the `index.json` and the `users.csv` are stored in the same `data` directory. Otherwise you can adapt the path to the CSV file. Make sure to add an `idtype` and the `size` of the table. Each column contains of a name that is used for later reference, and a value type (i.e., `string`, `int`, `real`). --- **Heads up!** After changing the source data or the `index.json` you have to restart the Phovea server using `docker-compose restart api` from the workspace or project directory. --- You can access the data now directly from the Phovea REST API. **Dataset** 📕what about the TypeScript API? there is no need to use the REST API directly * `/api/dataset/` returns the metadata of all available datasets including an automatically generated `id` * `/api/dataset/<dataset_id>` and `/api/dataset/table/<dataset_id>/data` return the formatted data for the given dataset id * `/api/dataset/table/<dataset_id>` returns the metadata for the given dataset id * `/api/dataset/table/<dataset_id>/rows` returns a list of all row ids from the dataset * `/api/dataset/table/<dataset_id>/rowIds` returns the ids in the Phovea range format (e.g., `(0:10)`) * `/api/dataset/table/<dataset_id>/raw` returns the JSON data for the given dataset id * `/api/dataset/table/<dataset_id>/col/<column_name>` returns the data for a column of the given dataset id **Views** * `/table/<dataset_id>/view/<view_name>` returns the metadata of the view * `/table/<dataset_id>/view/<view_name>/raw` returns the JSON data for the given view of the dataset * `/table/<dataset_id>/view/<view_name>/rows` returns a list of all row ids found for the view of the dataset * `/table/<dataset_id>/view/<view_name>/rowIds` returns the ids in the Phovea range format (e.g., `(0:10)`) for the view of the dataset **TODO** Explain how to define the view in the `index.json`. ## Matrix In contrast to a [table](#table) all columns of a matrix have the same data type (e.g., `int` or `real`). Place this `time-series.csv` in your data directory: ```csv user_id, 2010, 2011, 2012, 2013, 2014, 2015 user_0, 18, 34, 57, 32, 25, 46 user_1, 95, 41, 15, 43, 82, 44 user_2, 57, 46, 37, 54, 25, 86 user_3, 34, 93, 68, 41, 54, 18 user_4, 68, 23, 32, 69, 12, 39 user_5, 34, 12, 49, 80, 11, 58 user_6, 21, 58, 30, 99, 68, 17 user_7, 84, 85, 60, 48, 48, 38 user_8, 71, 17, 48, 20, 60, 39 user_9, 72, 69, 23, 57, 53, 56 ``` --- **Heads up!** Phovea requires an `id` column as first column for this data type (i.e, `string` or `int`)! --- Now we have to register this file in the `index.json` and add some metadata. ```json [ { "name": "Performance Time Series", "description": "User performance over time", "path": "time-series.csv", "separator": ",", "type": "matrix", "size": [10, 6], "rowtype": "Users", "coltype": "Years", "value": { "type": "int", "range": [0, 100] } } ] ``` You can access the data now directly from the Phovea REST API. **Dataset** * `/api/dataset/` returns the metadata of all available datasets including an automatically generated `id` * `/api/dataset/<dataset_id>` and `/api/dataset/matrix/<dataset_id>/data` return the formatted data for the given dataset id * `/api/dataset/matrix/<dataset_id>` returns the metadata for the given dataset id * `/api/dataset/matrix/<dataset_id>/rows` returns a list of all row ids from the dataset * `/api/dataset/matrix/<dataset_id>/rowIds` returns the ids in the Phovea range format (e.g., `(0:10)`) * `/api/dataset/matrix/<dataset_id>/cols` returns a list of all column ids from the dataset * `/api/dataset/matrix/<dataset_id>/colIds` returns the ids in the Phovea range format (e.g., `(0:10)`) * `/api/dataset/matrix/<dataset_id>/raw` returns the JSON data for the given dataset id * `/api/dataset/matrix/<dataset_id>/hist` returns a histogram for the matrix data * `/api/dataset/matrix/<dataset_id>/stats` returns statistical values of the matrix data (e.g., q1, q3, min, max, sum, median, mean, skewness) ## Vector TODO ## Stratification TODO