Tokenize a character string. More...

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <assert.h>
#include "misc.h"
#include "tokenizer.h"

Functions
Tokenizer *	Tokenizer_new (int maxTokens)
	Tokenizer constructor.

void	Tokenizer_free (Tokenizer *self)
	Tokenizer destructor.

int	Tokenizer_split (Tokenizer self, char buff, const char *sep)
	Turn string "buff" into an array of tokens, assuming that tokens in the input string may be separated by any of the characters in string "sep". More...

char *	Tokenizer_token (Tokenizer *t, int ndx)
	Return pointer to token with given index.

int	Tokenizer_strip (Tokenizer t, const char extraneous)
	Strip extraneous chars (those list in "extraneous") from both ends of each token. More...

int	Tokenizer_ntokens (Tokenizer *t)
	Return number of tokens.

int	Tokenizer_find (Tokenizer t, const char s)
	Search for string s among tokens. More...

void	Tokenizer_printSummary (const Tokenizer tkz, FILE ofp)
	Print a summary of the information in a Tokenizer.

void	Tokenizer_print (const Tokenizer tkz, FILE ofp)
	Print Tokenizer object.

void	Tokenizer_clear (Tokenizer *self)

Detailed Description

Tokenize a character string.

Author: Alan R. Rogers

This file implements a class that tokenizes a string. Usage is like this:

int ntokens; char buff[100]; Tokenizer *tkz = Tokenizer_new(maxTokens);

strcpy(buff, "my: input ; ; string"); ntokens = Tokenizer_split(tkz, buff, ":;");

The 3rd argument to Tokenizer_split defines the characters that separate tokens. Then you can access individual tokens like this:

char *token;

token = Tokenizer_token(tkz, 3);

The tokens themselves still reside in buff, so this array must not change until the next call to Tokenizer_split.

The memory allocated by Tokenizer_new is freed by Tokenizer_free.

The argument to Tokenizer_new determines the initial size of an internal array of pointers to tokens. If Tokenizer_split is handed a string with more than this number of tokens, it will re-allocate the internal array. It is legal to initialize with Tokenizer_new(0).

Copyright: Copyright (c) 2014,2018 Alan R. Rogers roger.nosp@m.s@an.nosp@m.thro..nosp@m.utah.nosp@m..edu. This file is released under the Internet Systems Consortium License, which can be found in file "LICENSE".

Function Documentation

◆ Tokenizer_find()

int Tokenizer_find	(	Tokenizer *	t,
		const char *	s
	)

Search for string s among tokens.

On success, return index of token. On failure, return the current number of tokens. After each call, the returned value should be compared with that of Tokenizer_ntokens.

◆ Tokenizer_split()

int Tokenizer_split	(	Tokenizer *	self,
		char *	buff,
		const char *	sep
	)

Turn string "buff" into an array of tokens, assuming that tokens in the input string may be separated by any of the characters in string "sep".

Supress empty tokens. Return the number of tokens.

References strCountSetChunks().

Referenced by DAFReader_next(), and RAFReader_next().

◆ Tokenizer_strip()

int Tokenizer_strip	(	Tokenizer *	t,
		const char *	extraneous
	)

Strip extraneous chars (those list in "extraneous") from both ends of each token.