Modern replacement for csv for the year 2021
version: 0.0.8 date: 2021-07-16 author: bestia.dev repository: GitHub
Hashtags: #rustlang #tutorial My projects on Github are more like a tutorial than a finished product: bestia-dev tutorials.
My proposed format for import/export of 2 dimensional database tables.
Difference between QVS20 and QVS21
QVS stands for "sQuare brackets Separated Values".
20 is the year of the first version: 2020.
21 is the year of the second version: 2021.
The second version has everything from the first version plus added support for SubTables.
It is not an update, but a more complex version.
For most projects it is enough to use the QVS2020. The code is smaller, faster, simpler.
Only if there is absolute need for SubTables the QVS21 must be used.
Except for SubTables these versions are 100% compatible.
Find the repository for QVS20 here: https://github.com/bestia-dev/QVS20.
Data type SubTable
Inside one cell of the table is possible to insert a whole sub-table.
Just like that. Nothing changes, no special escaping, because we have start and end delimiters.
So we can represent also hierarchical data, if it is really needed.
Still the primary goal of the standard is tabular data.
Yes, but one tiny thing must change: the row delimiter of the sub-table is not \n anymore.
It is the number that represent the depth of the sub-table: 1, 2, 3,...
It must be only one byte. This format is not really great for hierarchies deeper than 9 levels.
Example of CountryTable:
[Country][Population] [Slovenia] [Italia] [Croatia]
Now we want to insert the data of the cities, but hierarchically as a sub-table.
Example of CityDataTable for Slovenia:
[City][MetropolitanPopulation] [Ljubljana] [Koper]
Together table and sub-table looks like this:
The sub-table row delimiter is changed to "1".
For easy parsing, the sub-table starts with the new row delimiter.
[Country][CityDataSubTable][Population] [Slovenia][1[Ljubljana]1[Koper]1] [Italia][1[Milano]1[Venezia]1]
This is not very human readable, because the lack of spaces.
But you can visualize it like this:
[Country ] [CityDataSubTable ] [Population] [Slovenia] [1[Ljubljana]1 [Koper ][ 30000]1 ] [ 2000000] [Italia ] [1[Milano ]1 [Venezia ][ 30000]1 ] [ 60000000] ...
Row delimiter LF and sub-tables
The basic row delimiter is LF. Not CR, not CRLF, but exactly LF.
Every row must end with the row delimiter, especially the last row .
There is a small performance problem with sub-tables here.
Let me explain and come to the solution with this flow of thoughts.
For sequential reading of
QVS21 files the inserted sub-tables are not a problem. It works well.
If we read every field sequentially, we know when the sub-table starts and ends. Easy. But sometimes we want to go very fast line by line and read only the first field for filtering. Because of sub-tables, the next LF is not always the start of a new row.
Let's solve this problem.
We can use again the fact that we know the different start and end delimiters.
So between the end field and the new row we can put a different
The first level row delimiter is conveniently LF.
For the first sub-table the row delimiter changes to the string
For the next depth level
2 and so on. The number is the depth of the sub-table.
In any moment it will be clear what is the row delimiter for that explicit sub-table.
Very important rule: every row, and the last row especially, must end with a row delimiter !
We must take care to limit the row delimiter to only one byte. It means there cannot be a sub-table nested deeper than 9 levels. And that is fine for this type of tabular format.
Schema 3rd row - Sub table schemas
SubTables schemas are write in the 3rd row.
For other column, the field is empty .
I want the file extension to be specific for the version of the standard.
File extension and standard name are the same:
Read also the separate XXX.md files
I use the same README.md file for GitHub, Crates.io and docs.rs.
So I cannot include the specific information that are not common to all 3 purposes. For that reason I have separate XXX.md files:
- DEVELOPMENT.md - information and instruction for development
- CHANGELOG.md - what changed between versions
- TODO.md - reminder of what is in plan to do