There are several data types in the Seshat database.
| Type | Description |
|---|---|
A/P/U/~ |
Absent/Present/Unknown/Transitional |
RANGE |
Numeric Range |
TEXT |
Free Text |
CHOICES |
Fixed Choice List |
COMPLEX |
Multi-Field / Structured Input |
A/P/U/~
The most common data type is the A/P/U/~ used across domains such as social complexity, warfare, religion, etc. Coding these variables means choosing one of the options from the A/P/U/~ list (see Table 2. below), and a separate confidence tag (see Table 3. below):
| Code in SQL | Code in HTML |
|---|---|
present |
Present |
absent |
Absent |
A~P |
Transitional (Absent → Present) |
P~A |
Transitional (Present → Absent) |
unknown |
Unknown |
uncoded |
Uncoded |
| Tag in SQL | Tag in HTML |
|---|---|
TRS |
Confident |
SSP |
Suspected |
IFR |
Inferred |
UND |
Undecided |
We’ve separated the tag from the value to keep the structure clean and extensible. This allows for easy transformation into combined formats such as “inferred present” or “inferred absent” during export or analysis. As Peter mentioned, it’s straightforward to map between this structure and merged tag-value formats in scripts or external outputs.
Note on the tag SSP (Suspected):
This tag can only be assigned to rows coded as unknown. It acts as a temporary flag to signal uncertainty — typically raised by a Research Assistant for review by a senior researcher, who may confirm or revise the entry.
Note on the tag UND (Undecided):
In the current database structure for A/P/U/~ variables, no variable should have an empty coded value. During earlier stages of coding (particularly on the old website), there were cases where a value was left uncoded. These cases normally contained a decent description or references, but lacked a definitive value (e.g., present, absent, etc.). Such records have been transformed to the SQL database and are now explicitly marked with the code uncoded and tagged with UND (Undecided) tag. From the front-end, it is no longer possible to leave dropdown selections for A/P/U/~ empty — ensuring that going forward, such phantom cases cannot be introduced.
KzChion.html
| polity_id | script | tag |
|---|---|---|
| 277 | uncoded | UND |
RANGE
polity_population): These are typically represented by two numeric fields — value_from and value_to.
value_to may be left empty.value_from and value_to can be left blank. This explicitly indicates that the value is unknown.TRS (Confident), IFR (Inferred), SSP (Suspected), or UND (Undecided).
SSP (Suspected) is only appropriate for entries that are left blank, i.e., unknown. It flags them for further review by a senior researcher.UND (Undecided) is used when the coder could not determine a value at the time of entry, even if descriptive information or references are present.TRS (Confident) is used when the coder is sure the value is unknown. Although based on experience, these codes normally need some review as well.IqUbaid.html
| polity_id | polity_population_from | polity_population_to | tag |
|---|---|---|---|
| 473 | (empty) | (empty) | SSP |
IqIsinL.html
| polity_id | polity_population_from | polity_population_to | tag |
|---|---|---|---|
| 478 | (empty) | (empty) | UND |
YeNeoL*.html
TRS.
| polity_id | polity_population_from | polity_population_to | tag |
|---|---|---|---|
| 79 | (empty) | (empty) | TRS |
COMPLEX
Each variable classified as having the data type COMPLEX requires special handling, as its structure differs significantly from the traditional data formats used in Seshat.
These datasets were originally created before the database schema was formalized. Because each row contains multiple binary-coded variables, we merged the value and confidence tag into a single dropdown. The standardized options are:
| Original Code | Code in SQL | Meaning |
|---|---|---|
| A | A | Absent |
| P | P | Present |
| A* | IA | Inferred Absent |
| P* | IP | Inferred Present |
| U* | U | Unknown |
| SU* | SU | Suspected Unknown |
Blank spreadsheet cells or unselected dropdown options during front-end coding will result in the value being stored as NULL or None.
Note on year_from / year_to:
In the power_transition dataset, both year_to and year_from were originally coded, but going forward only year_to should be used. year_from can be safely ignored in new data or analyses.