Seshat

Seshat Data Coding Protocols

There are several data types in the Seshat database.

Type	Description
`A/P/U/~`	Absent/Present/Unknown/Transitional
`RANGE`	Numeric Range
`TEXT`	Free Text
`CHOICES`	Fixed Choice List
`COMPLEX`	Multi-Field / Structured Input

Data Type: A/P/U/~

The most common data type is the A/P/U/~ used across domains such as social complexity, warfare, religion, etc. Coding these variables means choosing one of the options from the A/P/U/~ list (see Table 2. below), and a separate confidence tag (see Table 3. below):

Table 2: Valid Codes for `absent_present` Variables
Code in SQL	Code in HTML
`present`	Present
`absent`	Absent
`A~P`	Transitional (Absent → Present)
`P~A`	Transitional (Present → Absent)
`unknown`	Unknown
`uncoded`	Uncoded

Table 3: Valid Tags for all Variables
Tag in SQL	Tag in HTML
`TRS`	Confident
`SSP`	Suspected
`IFR`	Inferred
`UND`	Undecided

We’ve separated the tag from the value to keep the structure clean and extensible. This allows for easy transformation into combined formats such as “inferred present” or “inferred absent” during export or analysis. As Peter mentioned, it’s straightforward to map between this structure and merged tag-value formats in scripts or external outputs.

Note on the tag SSP (Suspected):
This tag can only be assigned to rows coded as unknown. It acts as a temporary flag to signal uncertainty — typically raised by a Research Assistant for review by a senior researcher, who may confirm or revise the entry.

Note on the tag UND (Undecided):
In the current database structure for A/P/U/~ variables, no variable should have an empty coded value. During earlier stages of coding (particularly on the old website), there were cases where a value was left uncoded. These cases normally contained a decent description or references, but lacked a definitive value (e.g., present, absent, etc.). Such records have been transformed to the SQL database and are now explicitly marked with the code uncoded and tagged with UND (Undecided) tag. From the front-end, it is no longer possible to leave dropdown selections for A/P/U/~ empty — ensuring that going forward, such phantom cases cannot be introduced.

See the example below for more clarity:

Old File: KzChion.html

♠ Script ♣ ♥ "the early steppe peoples would not have been a promising vehicle for the diffusion of complicated, textually based knowledge; according to the Northern Wei dynastic history, the Rouran were illiterates whose leaders at first kept records of their troop numbers by piling up sheep turds as counters but eventually graduated to scratching simple marks onto pieces of wood. Not surprisingly, there is no evidence of the transmission of Chinese military theories and texts to the West by way of the Avars, other steppe nomads, Silk Road caravans, or any other channel prior to the activities of the Jesuit missionaries in the seventeenth and eighteenth centuries."^[52]

The value has not been coded, although there is a good description and a reference.

SQL Database

polity_id	script	tag
277	uncoded	`UND`

Data Type: RANGE

Range variables (e.g., polity_population): These are typically represented by two numeric fields — value_from and value_to.
- If a precise value is known rather than a range, both fields should be set to the same number, or alternatively, value_to may be left empty.
- From the front-end, both value_from and value_to can be left blank. This explicitly indicates that the value is unknown.
Use of tags: As with absent/present-type variables, range variables may also be accompanied by confidence tags: TRS (Confident), IFR (Inferred), SSP (Suspected), or UND (Undecided).
- The tag SSP (Suspected) is only appropriate for entries that are left blank, i.e., unknown. It flags them for further review by a senior researcher.
- The tag UND (Undecided) is used when the coder could not determine a value at the time of entry, even if descriptive information or references are present.
- The tag TRS (Confident) is used when the coder is sure the value is unknown. Although based on experience, these codes normally need some review as well.

See examples below for more clarity:

Old File: IqUbaid.html

♠ Polity Population ♣ suspected unknown ♥ People. The researchers deeply believed that the north Ubaid was more populated than southern regions of Ubaid. ^{[23], [24]} There are known some calculation regarding the size of populations inhabited some particular sites such as Tell al-Hawa (1500-4000 people, area of the site - 15-20 ha), Site 118 (500-1200 people; area of the site- 5-6 ha) and Khanijdal East (100-200 people, area of the site- 1ha). There are based on a range of on-site population densities of 100 to 200 people per ha. ^[25]

The value has been coded suspected unknown.

SQL Database

polity_id	polity_population_from	polity_population_to	tag
473	(empty)	(empty)	`SSP`

Old File: IqIsinL.html

♠ Polity Population ♣ ♥ People: "Despite these changes, the total number of inhabitants and the relations between cities and villages remained roughly the same [as in the Ur III period]."^[11]

The value has not been coded, although there is a good description and a reference.

SQL Database

polity_id	polity_population_from	polity_population_to	tag
478	(empty)	(empty)	`UND`

Old File: YeNeoL*.html

♠ Polity Population ♣ unknown ♥ ^[41] "Fig. 4.2. Qotakalli sites in the Cusco Basin (after AD 400)" redrawn from Bauer. ^[42] Qotakalli sites in the Cuzco Basin 1-5 ha sites: 16 0.25-1 ha sites: 35 If the 16 largest sites average 2.5 ha, and the 35 smallest sites averaged 0.625 ha Qotakalli sites cover a total of 61.875 ha. "Strong population growth occurred during this period" as revealed by settlement pattern data.[43] ^[43]

The value has been coded unknown, hence the tag TRS.

SQL Database

polity_id	polity_population_from	polity_population_to	tag
79	(empty)	(empty)	`TRS`

Data Type: COMPLEX

Each variable classified as having the data type COMPLEX requires special handling, as its structure differs significantly from the traditional data formats used in Seshat.

Power Transitions & Crisis Consequences

These datasets were originally created before the database schema was formalized. Because each row contains multiple binary-coded variables, we merged the value and confidence tag into a single dropdown. The standardized options are:

Original Code	Code in SQL	Meaning
A	A	Absent
P	P	Present
A*	IA	Inferred Absent
P*	IP	Inferred Present
U*	U	Unknown
SU*	SU	Suspected Unknown

Blank spreadsheet cells or unselected dropdown options during front-end coding will result in the value being stored as NULL or None.

Note on year_from / year_to:
In the power_transition dataset, both year_to and year_from were originally coded, but going forward only year_to should be used. year_from can be safely ignored in new data or analyses.