Seshat Data Coding Protocols

There are several data types in the Seshat database.

Type Description
A/P/U/~ Absent/Present/Unknown/Transitional
RANGE Numeric Range
TEXT Free Text
CHOICES Fixed Choice List
COMPLEX Multi-Field / Structured Input

Data Type: A/P/U/~

The most common data type is the A/P/U/~ used across domains such as social complexity, warfare, religion, etc. Coding these variables means choosing one of the options from the A/P/U/~ list (see Table 2. below), and a separate confidence tag (see Table 3. below):

Table 2: Valid Codes for absent_present Variables
Code in SQL Code in HTML
present Present
absent Absent
A~P Transitional (Absent → Present)
P~A Transitional (Present → Absent)
unknown Unknown
uncoded Uncoded
Table 3: Valid Tags for all Variables
Tag in SQL Tag in HTML
TRS Confident
SSP Suspected
IFR Inferred
UND Undecided

We’ve separated the tag from the value to keep the structure clean and extensible. This allows for easy transformation into combined formats such as “inferred present” or “inferred absent” during export or analysis. As Peter mentioned, it’s straightforward to map between this structure and merged tag-value formats in scripts or external outputs.

Note on the tag SSP (Suspected):
This tag can only be assigned to rows coded as unknown. It acts as a temporary flag to signal uncertainty — typically raised by a Research Assistant for review by a senior researcher, who may confirm or revise the entry.

Note on the tag UND (Undecided):
In the current database structure for A/P/U/~ variables, no variable should have an empty coded value. During earlier stages of coding (particularly on the old website), there were cases where a value was left uncoded. These cases normally contained a decent description or references, but lacked a definitive value (e.g., present, absent, etc.). Such records have been transformed to the SQL database and are now explicitly marked with the code uncoded and tagged with UND (Undecided) tag. From the front-end, it is no longer possible to leave dropdown selections for A/P/U/~ empty — ensuring that going forward, such phantom cases cannot be introduced.

See the example below for more clarity:
Old File: KzChion.html
The value has not been coded, although there is a good description and a reference.
SQL Database
polity_id script tag
277 uncoded UND
Data Type: RANGE
  • Range variables (e.g., polity_population): These are typically represented by two numeric fields — value_from and value_to.
    • If a precise value is known rather than a range, both fields should be set to the same number, or alternatively, value_to may be left empty.
    • From the front-end, both value_from and value_to can be left blank. This explicitly indicates that the value is unknown.
  • Use of tags: As with absent/present-type variables, range variables may also be accompanied by confidence tags: TRS (Confident), IFR (Inferred), SSP (Suspected), or UND (Undecided).
    • The tag SSP (Suspected) is only appropriate for entries that are left blank, i.e., unknown. It flags them for further review by a senior researcher.
    • The tag UND (Undecided) is used when the coder could not determine a value at the time of entry, even if descriptive information or references are present.
    • The tag TRS (Confident) is used when the coder is sure the value is unknown. Although based on experience, these codes normally need some review as well.
See examples below for more clarity:
Old File: IqUbaid.html
The value has been coded suspected unknown.
SQL Database
polity_id polity_population_from polity_population_to tag
473 (empty) (empty) SSP
Old File: IqIsinL.html
The value has not been coded, although there is a good description and a reference.
SQL Database
polity_id polity_population_from polity_population_to tag
478 (empty) (empty) UND
Old File: YeNeoL*.html
The value has been coded unknown, hence the tag TRS.
SQL Database
polity_id polity_population_from polity_population_to tag
79 (empty) (empty) TRS
Data Type: COMPLEX

Each variable classified as having the data type COMPLEX requires special handling, as its structure differs significantly from the traditional data formats used in Seshat.

Power Transitions & Crisis Consequences

These datasets were originally created before the database schema was formalized. Because each row contains multiple binary-coded variables, we merged the value and confidence tag into a single dropdown. The standardized options are:

Original Code Code in SQL Meaning
A A Absent
P P Present
A* IA Inferred Absent
P* IP Inferred Present
U* U Unknown
SU* SU Suspected Unknown

Blank spreadsheet cells or unselected dropdown options during front-end coding will result in the value being stored as NULL or None.

Note on year_from / year_to:
In the power_transition dataset, both year_to and year_from were originally coded, but going forward only year_to should be used. year_from can be safely ignored in new data or analyses.