Topicalizing and inferential la in Toki Pona: Thoughts on constituent structure

For a while now I’ve been trying to apply my knowledge of Lexical-Functional Grammar (LFG; Kaplan & Bresnan 1982; most recently Dalrymple 2023; for a quick overview, see e.g. Müller 2023: 223–246; Belyaev 2023a) to Toki Pona (Lang 2014; 2021) in order to analyze its constituent structure. LFG’s constraint-based approach seems like an overall useful way to describe syntactic structures, even in a totally isolating language like Toki Pona. Especially the fact that Toki Pona all but lacks morphology makes this a nice brain-teaser. You can find the working draft of what’s by now grown into a whole treatise on Codeberg. And while I’ve come pretty far since I started this project during the 2024/25 Christmas break, the construction usually referred to as a “la phrase” keeps bothering me in the back of my mind. So let me explain in order to sort out my own thoughts.

Exposition

In the words of Lang (2014: 51), la “allows you to link two sentences, or link a fragment to a sentence.” This is a very broad statement, though as a construction, la phrases are pretty versatile indeed and occur frequently. In the more concrete words of sona pona (s.v. la), la as a grammatical marker

indicates that what comes after it follows from or depends on what came before. It is extremely general, and can indicate all sorts of antecedent–consequent relationships, including prepositional descriptions, cause and effect, conditions, or where, when, or how something happens. Generally, what comes before la is the context for what comes after.

In its most typical use, la is a topicalizer, that is, it marks the part at the beginning of a sentence that tells you what it “is about,” highlighting contextually important information, loosely speaking.¹ A typical example of la as used to front an adverbial in order to highlight that information is given in (1).

jan Ale li pali e ko pan kepeken ilo.

person Ale IND work OBJ paste grain with tool

‘Alex prepares the batter with a mixer.’

kepeken ilo la, jan Ale li pali e ko pan.

with tool TOP person Ale IND work OBJ paste grain

‘It’s with a mixer that Alex prepares the batter.’

Example (1a) contains an adverbial prepositional phrase (PP) kepeken ilo ‘with a tool’, where in context the tool happens to be a mixer. In example (1b), this adverbial is fronted, and the topicalized phrase uses la to signal that what comes before it is a separate constituent in front of the comment part of the clause after it, jan Ale li pali e ko pan ‘Alex is preparing the batter’.

What’s more, la can mark sentence adverbials, which are likewise located in the topic position, as illustrated in (2a). In those cases, a sentence adverb like pona ‘fortunately’ has no correlate in the comment, unlike the adverbial kepeken ilo ‘with a tool’ in (1).²

pona la, ona li kama.

good TOP 3 IND come

‘Fortunately, they’re coming.’

# ona li kama pona.

3 IND come good

‘They’re arriving well.’

That is, the adverb pona in (2a) relates to the whole comment ona li kama ’they’re coming’ as such, rather than characterizing the action kama ‘come’ as in (2b). In order to avoid changing the meaning, example (2a) may be rephrased as in (3), showing that the sentence is essentially composed of two correlated statements: ni li pona ’this is good’ and ona li kama ’they’re coming’. The quality “good” is explicitly assigned to the fact of their arrival, not to the action of arriving as such.

ni li pona: ona li kama.

DET IND good 3 IND come

‘This is good: They’re coming.’ (or: ‘It’s good that they’re coming.’)

Adverbials in the topic position like in example (2a), which are syntactically integrated into the clause but aren’t semantically part of it, as illustrated by (2b) and (3), are called disjunct adverbials and pose a certain challenge for a formal analysis. Before delving into a more detailed discussion, however, let’s first look at some theoretical backgrounds regarding Toki Pona’s constituent structure (c-structure) and previous research on disjunct adverbials that is useful for an analysis also of la phrases.

Intermission

Toki Pona is an extremely consistently head-initial language: in general, modifiers follow their heads. This principle operates on all levels of syntax, from the order of syntactic constituents (SVO; subject > predicate) to the order of adjectives (N > Adj), demonstratives (N > Dem), possessors (N > Poss), adpositions (Prep > N), etc. Moreover, as part of Toki Pona’s minimalist philosophy, there are no subordinate clauses. Relative clauses, adverbial clauses, and closed complements (COMP) are proscribed by the grammar, so clauses can’t recursively contain clauses as adjuncts or complements (cf. sona pona: s.v. Recursion). Open complements (XCOMP) and closed predicative complements (PREDLINK) still exist, however. Overall, the X′ schema in figure 1 may be assumed as a rule.

consituent-stucture tree — Figure 1: Right-branching X′ schema with LFG extensions

In the constituent-structure tree (c-structure) on the left side, the specifier position is to the very left. The main reason to assume this is that, by Toki Pona’s strict SVO constituent order, subjects always come first. The specifier position is also typically associated with syntactic subjects. Apart from this, specifiers don’t play a role in Toki Pona’s grammar. Next up are the phrasal head and its complement. Since Toki Pona is generally right-branching, complements are found to the right of their heads.³ Adjuncts—that is, optional modifiers of the head like adjectives, possessors, etc.—follow at the very end of the phrase.

The functional structure (f-structure) on the right side describes the grammatical properties of the elements in the phrase based on the semantic content of its lexical items. Since the head is central to the phrase, it forms the predicator (PRED) of the f-structure labeled f. In this example, the head is also indicated to possess an argument structure (a-structure; in <…>), which here indicates that it requires a complement (COMPL) in the same f-structure (f). Grammatical information on the head’s complement is accordingly found within f in the complement function (COMPL), here labeled i. Since adjuncts are iterable, a phrasal head can be attributed by multiple adjuncts, which all become part of the adjunct-function set (ADJ), labeled h. This is why the ZP node in the c-structure is annotated with ↓ ∈ (↑ ADJ), indicating that the current node (↓) is an element (∈) of the adjunct function of the next higher phrasal node (↑).

Regarding the syntactic analysis of disjunct adverbials, there seems to be no generally accepted formal model in LFG at the moment to describe modification relationships between formally independent clauses. Dalrymple, Lowe & Mycock (2019: 394) report that “[l]ittle work currently exists in LFG on discourse structure (that is, cross-clausal information structuring),” and this state of affairs hasn’t changed since to the best of my knowledge. There’s an MPhil thesis on a typology of adverbs and ways to model them in LFG (supervised by Dalrymple), however, the author explicitly excludes a detailed discussion of “comma intoned adverbials” (Cobb 2006: 6), as they call disjunct adverbials for their characteristic intonation pattern. Instead, they refer readers to a forthcoming paper—which apparently has never been published, however.

While Cobb (2006) doesn’t go into detail on disjunct adverbials much, they at least provide the example in figure 2 as a model. Here, like in example (2a), the adverbs obviously and clearly have been fronted as topics and are analyzed in f-structure as information disjoined from the main clause in h. Even though the whole sentence forms a syntactic unit with inflectional phrase (IP) stacked on inflectional phrase, the f-structures f and g are displayed as independent of h, not integrated into it.

Cobb (2006: 93) doesn’t provide functional annotations on the phrasal nodes in c-structure, so it’s not obvious how the disjunct reading arises formally (DIS = displaced information). One might expect an annotation like ↑ ≠ ↓ in analogy to ↑ = ↓ in figure 1 on the IP sister nodes of the attribute phrases (AP) carrying the adverbs: the next higher node (↑) is not a superset of the below information (↓). However, it’s unclear to me where this constraint breaking the upward percolation (and set unification) of grammatical information would originate from in the English example.

Discussion

In spite of Toki Pona’s proscription against subordinate clauses, a la phrase may as well contain a whole clause, as illustrated by (4a). Indeed, the only way to combine mi lape pona ‘I’ll sleep well’ and mi pali mute ‘I worked a lot’ into one sentence is by topicalizing one clause of the two. The second clause, mi pali mute, in this context plays the part of an adverbial clause, providing the reason for mi lape pona. The sentence in (4a) can be rephrased accordingly as in (4b). How the two clauses relate to each other semantically is entirely up to context.

mi pali mute la, mi lape pona.

1 work much TOP 1 sleep well

‘Since I worked a lot, I’ll sleep well.’

mi lape pona tan ni: mi pali mute.

1 sleep well because.of DEM 1 work much

‘I’ll sleep well because I worked a lot.’

While mi pali mute ‘I’ve worked a lot’ in (4a) is an adverbial clause providing a reason for the statement in the comment, mi lape pona ‘I’ll sleep well’, I can’t manage to interpret the adverbial information as originating from inside the comment part, that is, as a sister element to pona in the set of adjuncts (adj; see figure 1). Besides the fact that a whole clause can’t serve as an adjunct in Toki Pona because of its restrictions on recursion, only constituents can be fronted, not individual adjuncts. Examples (5a) and (5b) thus aren’t synonymous in the same way (2a) and (2b) aren’t.

mi lape pona.

1 sleep well

‘I’ll sleep well.’

# pona la, mi lape.

good TOP 1 sleep

‘Fortunately, I’ll sleep.’

One of the issues with the adverbial clause in (4a) is that if it originated as an adjunct inside the comment part of the clause, *I’ll sleep well and because I worked a lot might be expected to work, also in English, because the adverbial clause would be a sister element of the adverb well, and alike grammatical elements can be linked by and. That’s obviously not the case for our example, though. It may be deduced that the adverbial clause in the topic is thus rather unlikely to relate to a syntactic gap in the comment. Instead, it’s an adjunct of the whole clause making up the statement in the comment part as such—probably at the superordinate level of discourse structure (d-structure). Figure 3 shows an attempt to model the sentence in (4a) based on Cobb’s (2006: 93) schema in figure 2.

I think it’s indubitable that la is a functional head of sorts, since it bears no lexical information, that is, it’s not a content word. Instead, it’s part of the class of sentence-structuring grammatical morphemes like li or e. Furthermore, mi lape pona and mi pali mute are completely self-contained clauses, which makes them IPs by my analysis (more on that in a moment). Since la serves to connect the two parts, it’s reasonable to assume that it’s a complementizer (C⁰), that is, the head of a complementizer phrase (CP).

Accordingly, the lower IP, which corresponds to the f-structure labeled g, is la’s complement, while the higher IP, which corresponds to f, occupies the CP’s specifier position as you’d expect topicalized information to. Since the lower IP is disjoined from the rest of the clause, I went through with my suggestion above and tentatively annotated it with ↑ ≠ ↓. The dashed line from C′ to IP is supposed to indicate that while syntactically, the whole sentence forms a unit, it doesn’t semantically. Unlike in the English example in figure 2, in Toki Pona, it’s easy to assume that it’s la which determines its complement to be semantically and functionally disjoined from the rest of the sentence.⁴

My analysis in figure 3 also conveniently allows for la to take another CP headed by another la as a complement in place of the lower IP to account for the fact that la phrases may also be stacked. An example of this is given in (6) along with its c- and f-structure charted in figure 4.

suno ni la, pali mi la, mi pana e sona.

sun DEM TOP work 1 TOP 1 give OBJ knowledge

‘Today, at work, I was teaching.’

In spite of the term “la phrase,” the charts in figure 3 and 4 model la as not integrating with the topicalized bit. That is, suno ni la and pali mi la don’t form a phrase together, as hypothesized in figure 5, where la is modeled as a functional postposition akin to e which takes suno ni as a complement. Such a construction would be very unexpected given that Toki Pona is so very consistently head-first, as pointed out above. If la were a postposition (or something), it would also be necessary to explain how the example in (4a) is a grammatical statement, since clauses can’t serve as complements due to the mentioned restrictions on recursion in any other context. The analysis of la presented previously skirts these issues by treating la as very similar to li, the predicate marker.⁵

As far as LFG is concerned, IP is the common category for clauses in English, as exemplified by figure 2. According to my analysis, this is the case also for Toki Pona. As a purely grammatical morpheme (a “particle”), li doesn’t have any lexical meaning, its semantics are purely functional. In canonical cases, it introduces the predicate and provides the information that the subject of the clause is a third person and that the verb is in indicative mood (cf. sona pona: s.v. li). Example (7) illustrates these properties: jan Ale ‘Alex’ is a third-person subject, and being a simple declarative statement, the sentence is in indicative mood—it’s not phrased as an order (by using o in place of li).

jan Ale li pali e ko pan.

person Ale IND work OBJ paste grain

‘Alex is preparing the batter.’

Looking at the c- and f-structure of this example more formally in figure 6, li is modeled as the head of IP (I⁰). IP is the functional counterpart to a verb phrase (VP), headed by a lexical verb (V⁰). The VP in the example is headed by the verb pali ‘work, make’. That I⁰ and V⁰ work together as “co-heads” is made explicit by both IP and VP uniting their grammatical information in the same f-structure, labeled f, as the arrows between the c-structure and the f-structure diagrams indicate. Besides, the same goes for the PP carrying the object (OBJ), ko pan ‘batter’ and the functional preposition e, which marks the verb’s object as such (like a marks animate direct objects in Spanish; cf. Bresnan et al. 2016: 438). Thus, e (P⁰) and the noun phrase (NP) headed by ko (N⁰) place their grammatical information in h together as well.

Since any content word in Toki Pona can be used as any lexical part of speech and there’s no inflection whatsoever, grammatical morphemes indicating the structural makeup of a clause are an important part of the language’s grammar. Mainly, this task falls to li and e. While e requires only an object NP as a lexical complement, li requires that both its specifier and complement positions be filled with the clause’s subject and lexical verb phrase, respectively. Subjects like jan Ale are informationally prominent, and the specifier position of verbal projections is just the place for that. Due to the fact that li is a functional morpheme, it can’t occur on its own and necessarily requires a lexical verb as a complement: *jan Ale li is not a complete, well-formed clause in Toki Pona: li is not an auxiliary verb like English be or do, even if it occupies the same position in the tree.

Topic-marking la has the same requirements on c-structure as li, which is illustrated by the analysis in figure 3. While la contains no information on mood and isn’t subject to person agreement, it still requires that its specifier and complement position be filled. The specifier of CP is typically the place topics are found in, and thus all kinds of phrasal categories may fill that position. Since la is a functional morpheme, it requires a complement, and specifically, of some verbal category. In figure 3, this was the IP carrying the comment part of the clause, but as figure 4 shows, la may as well be complemented by another CP, just as Toki Pona allows for chains of modal and light verbs (8a). That is, a VP containing some light or modal verb (“preverb”) may be complemented by another such VP, as long as a main-verb VP follows eventually. In the same way, an IP must eventually follow as the complement of a CP in the stack (8b).

[IP [VP_pre [VP_pre [… [VP_main]]]]]
[CP [CP [… [IP]]]]

Progression

One of the more fun aspects of Toki Pona’s syntax is that coordination is by apposition of equivalent phrasal categories in most cases. Especially for those categories that have a grammatical morpheme to mark them, the second occurrence of the marker in a clause signifies the introduction of another conjunct (see Lang 2014: 56). Complications of person agreement aside (cf. sona pona: s.v. mi li and sina li), in order to coordinate predicates, there are two or more instances of li, as example (9) and the accompanying figure 7 show. In the example, the subject jan Po ‘Bo’ applies to two predicates, pali ‘works’, and moku ’eats’.

jan Po li pali, li moku.

person Po IND work IND eat

‘Bo works and eats.’

In figure 7, then, there are two predicate IPs in c-structure for each of pali and moku. Both IPs are functionally annotated with ↓ ∈ ↑. The annotation indicates that the whole of the clause’s predicate is a set, which is why f contains two f-structures, labeled f₁ and f₂. The sentence’s subject NP, jan Po ‘Bo’, is represented on the f-structure side once in f₁’s subject function (SUBJ), labeled g. Since grammatical functions spread to equivalent gaps in other conjuncts (Peterson 2004: 654–655), f₂’s subject function is indicated to be coreferential with that of f₁ by a connecting line.

Regarding coordination at the clause level (CP), sona pona (s.v. And) notes that “[c]ombining multiple sentences with different subjects, predicates and objects is not possible in Toki Pona. Separate sentences must be used.” This has so far stood out to me as very peculiar, since you’d kind of expect la (C⁰) as a marker of clauses to be able to be used for clausal coordination, just like a second li (I⁰) introduces another coordinated predicate in (9).

Moreover, no less peculiarly, Lang (2021: 9) writes that “whether to use a comma before la, after la, or to not use a comma at all is a personal stylistic choice,” and consistently—but unintuitively to me—uses a comma after la in the introductory exercise book (Lang 2014). However, I’d argue that the quoted statement is not quite correct (anymore?) because of clause-initial la as in (10) being a possibility as well.

mi pali mute, la mi lape pona.

1 work much so 1 sleep well

‘I worked a lot, so I’ll sleep well.’

While tokiponists, in my experience, profess that the difference between the clause-initial use of la as in (10) and that as a topic marker in (4a) is negligible in effect, clause-initial la has more of an inferential interpretation. It usually indicates a consequence, like so, thus or hence in English. While the semantic difference may be small, la decidedly doesn’t serve as a topic marker in these instances, so I’ll assume that there are, in fact, two kinds of la which aren’t fully equivalent in syntax. Personally, I also make an intonational difference between the topic marker la with a little pause after it (hence Cobb’s term “comma intoned adverbials”), and inferential la with a little pause before it.

I must admit that I only recently learned of F.A.N.B.O.Y.S. while doing research for my exploration of Toki Pona’s syntax. The acronym is a mnemonic for the lexical items which make up the class of conjunctions in English: for, and, nor, but, or, yet—and so. In English grammar at least, so is an inferential conjunction.⁶ Could it be one in Toki Pona as well, taking its idiosyncratic way of doing coordination constructions into account?

Toki Pona doesn’t allow subordinate clauses, but this restriction doesn’t apply to coordination, since that is between two equal parts (cf. Peterson 2004: 651–654). Thus, it’s not a problem actually to use la in clausal coordination in terms of syntax. In terms of semantics, however, la means ‘(and) so’ in those contexts rather than simply ‘and’. Figure 8 charts the grammatical structure of example (10).

The chart shows the analysis of inferential la with two CPs in apposition. Just as with li, e, or lexical prepositions (e.g. lon ‘in, at, on’; tawa ’to, towards’), the grammatical marker serves to introduce the second conjunct. An important difference to the construction in figure 7 with predicate-marking li is that only the second clausal conjunct is marked by la, broadly similar to how subjects are coordinated by only one en between them (e.g. mi en sina ‘you and I’; cf. Lang 2014: 56). However, this similarity may be only superficial: la in this context as well is still interpretable as a functional head in parallel to li, as it is still part of the system of verb-related markers. On the other hand, I’d argue that en is a conjunction proper, which enjoys a special syntactic status as such (cf. sona pona: s.v. en).

Conclusion

This blog-form essay aimed to discuss syntactic properties of the construction in Toki Pona commonly referred to as a “la phrase.” This refers to a construction on the left edge of a clause which typically contains prominent information. The grammatical marker la thus serves as essentially a topic marker. I argued that la is not syntactically part of the topic itself, so a syntactic la phrase with la as a head is an illusion. Rather, la is more conveniently interpreted as the head of a complementizer phrase (CP), requiring a topicalized constituent as its specifier, and a semantically and functionally disjunct comment or another equally disjunct CP as its complement. This makes la structurally similar to the predicate marker li, which requires a subject as its specifier and a verb phrase as its complement, albeit joining the two parts rather than disjoining them.

Moreover, multiple predicates headed by li are readily understood as coordinated by ‘and’, however, la can’t be used to coordinate clauses in the same way. I argued that in spite of this restriction, la may form a coordination construction if used not as a topicalizer but as a clause marker with inferential semantics, that is, meaning ‘(and) so’ instead of just ‘and’. This conclusion follows directly from the analysis of topic-marking la as a complementizer.

References

Belyaev, Oleg. 2023a. Introduction to LFG. In Mary Dalrymple (ed.), Handbook of Lexical Functional Grammar (Empirically Oriented Theoretical Morphology and Syntax 13), 3–22. Berlin: Language Science Press. DOI: 10.5281/zenodo.10185934 (🔓).
Belyaev, Oleg. 2023b. Core concepts of LFG. In Mary Dalrymple (ed.), Handbook of Lexical Functional Grammar (Empirically Oriented Theoretical Morphology and Syntax 13), 23–96. Berlin: Language Science Press. DOI: 10.5281/zenodo.10185936 (🔓).
Bresnan, Joan, Ash Asudeh, Ida Toivonen & Stephen Wechsler. 2016. Lexical-functional syntax (Blackwell Textbooks in Linguistics 16). 2nd edn. Chichester: Wiley Blackwell. [Worldcat]
Cobb, C. 2006. The syntax of adverbs: An LFG approach. Oxford University. (MPhil thesis). https://ora.ox.ac.uk/objects/uuid:d808caba-2c2c-4b84-8ec3-f62d78c67ab3 (visited on 07/01/2025).
Dalrymple, Mary (ed.). 2023. The Handbook of Lexical Functional Grammar (Empirically Oriented Theoretical Morphology and Syntax 13). Berlin: Language Science Press. DOI: 10.5281/zenodo.10037797 (🔓).
Dalrymple, Mary, John J. Lowe & Louise Mycock. 2019. The Oxford reference guide to Lexical Functional Grammar. Oxford: Oxford University Press. DOI: 10.1093/oso/9780198733300.001.0001 (🔒). [Worldcat]
Kaplan, Ronald M. & Joan Bresnan. 1982. Lexical-functional grammar. A formal system for grammatical representation. In Joan Bresnan (ed.), The mental representation of grammatical relations (MIT Press Series on Cognitive Theory and Mental Representation), 173–281. Cambridge: MIT Press. [Worldcat]
Lang, Sonja. 2014. Toki Pona: The language of good. https://www.tokipona.org (visited on 07/01/2025).
Lang, Sonja. 2021. Toki Pona dictionary. [Worldcat]
Müller, Stefan. 2023. Grammatical theory: From transformational grammar to constraint-based approaches. 5th ed. (Textbooks in Language Sciences 1). Berlin: Language Science Press. DOI: 10.5281/zenodo.7628029 (🔓).
Peterson, Peter G. 2004. Coordination: Consequences of a lexical-functional account. Natural Language & Linguistic Theory 22(3). 643–679. DOI: 10.1023/B:NALA.0000027673.49915.2b (🔒).
sona pona. 2022. The wiki for the micro-language Toki Pona. https://sona.pona.la (visited on 07/01/2025).

More strictly speaking, the la phrase contains [+PROMINENT] information that is not strictly limited to [−NEW]. This way, it may also contain a left-dislocated phrase rather than only a topic proper (cf. Dalrymple, Lowe & Mycock 2019: 369–373, 664–665). ↩︎
I’m using the hash character (#) to mark sentences which are grammatically well-formed statements but whose meaning doesn’t fit the context they’re being discussed in, so that the translation runs counter to expectation. ↩︎
There is one exception, however. When adjuncts are present, complements are typically extraposed to the right to avoid ambiguities in modification scope, resulting in an inversion of adjunct and complement. Thus typically and importantly, the linear order of elements in the verbal domain is verb (head) – adverb (adjunct) – object (complement). ↩︎
Maybe f-precedence (cf. Dalrymple, Lowe & Mycock 2019: 256–259; Belyaev 2023b: 39–41) could be used to formulate a suitable constraint. ↩︎
Paul Jorgensen from Langfocus on Youtube called li the “subject marker” in a video showcase on Toki Pona released in February 2025. While I understand how he came to this conclusion by looking at the surface structure of sentences, taking c-structure into account, this analysis is untenable. ↩︎
Different than in English, German also ‘so, thus’ is an adverb. In its conjunctional use it occupies the forefield of a clause (SpecCP) rather than the left verbal bracket (C⁰) as a conjunction proper would, so that the bracket can be used by the finite verb, triggering V2 word order. The subject is placed in the midfield (VP modifiers) because the forefield is already occupied by the adverb: also werde ich gut schlafen ‘so I will sleep well’. As in English, this sentence is a main clause. ↩︎

Exposition#

Intermission#

Discussion#

Progression#

Conclusion#

References#