wikidata-to-surrealdb/README.md
2023-12-15 07:55:27 +00:00

77 lines
1.8 KiB
Markdown

A tool for converting Wikidata dumps to a [SurrealDB](https://surrealdb.com/) database. Either From a bz2 or json file format.
# Getting The Data
https://www.wikidata.org/wiki/Wikidata:Data_access
## From bz2 file (Recommended) ~80GB
### Dump: [Docs](https://www.wikidata.org/wiki/Wikidata:Database_download)
### [Download - latest-all.json.bz2](https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.json.bz2)
## From json file
### Linked Data Interface: [Docs](https://www.wikidata.org/wiki/Wikidata:Data_access#Linked_Data_Interface_(URI))
```
https://www.wikidata.org/wiki/Special:EntityData/Q60746544.json
https://www.wikidata.org/wiki/Special:EntityData/P527.json
```
# Example .env
```
DB_USER=root
DB_PASSWORD=root
WIKIDATA_LANG=en
FILE_FORMAT=bz2
FILE_NAME=data/latest-all.json.bz2
```
# How to Query
## See [Useful queries.md](./Useful%20queries.md)
# Table Layout
## Thing
```rust
pub struct Thing {
pub table: String,
pub id: Id,
}
```
## Table: Entity, Property, Lexeme
```rust
pub struct EntityMini {
pub id: Option<Thing>,
pub label: String,
pub claims: Thing,
pub description: String,
}
```
## Table: Claims
```rust
pub struct Claims {
pub id: Option<Thing>,
pub claims: Vec<Claim>,
}
```
## Table: Claim
```rust
pub struct Claim {
pub id: Thing,
pub value: ClaimData,
}
```
## ClaimData
```rust
pub enum ClaimData {
Thing(Thing),
ClaimValueData(ClaimValueData),
}
```
# Similar Projects
- [wd2duckdb](https://github.com/weso/wd2duckdb)
- [wd2sql](https://github.com/p-e-w/wd2sql)
# License
All code in this repository is dual-licensed under either [License-MIT](./LICENSE-MIT) or [LICENSE-APACHE](./LICENSE-Apache) at your option. This means you can select the license you prefer. [Why dual license](https://github.com/bevyengine/bevy/issues/2373).