Legofit
infers population history from nucleotide site patterns.
Functions
tokenizer.c File Reference

Tokenize a character string. More...

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <assert.h>
#include "misc.h"
#include "tokenizer.h"

Functions

TokenizerTokenizer_new (int maxTokens)
 Tokenizer constructor.
 
void Tokenizer_free (Tokenizer *self)
 Tokenizer destructor.
 
int Tokenizer_split (Tokenizer *self, char *buff, const char *sep)
 Turn string "buff" into an array of tokens, assuming that tokens in the input string may be separated by any of the characters in string "sep". More...
 
char * Tokenizer_token (Tokenizer *t, int ndx)
 Return pointer to token with given index.
 
int Tokenizer_strip (Tokenizer *t, const char *extraneous)
 Strip extraneous chars (those list in "extraneous") from both ends of each token. More...
 
int Tokenizer_ntokens (Tokenizer *t)
 Return number of tokens.
 
int Tokenizer_find (Tokenizer *t, const char *s)
 Search for string s among tokens. More...
 
void Tokenizer_printSummary (const Tokenizer *tkz, FILE *ofp)
 Print a summary of the information in a Tokenizer.
 
void Tokenizer_print (const Tokenizer *tkz, FILE *ofp)
 Print Tokenizer object.
 
void Tokenizer_clear (Tokenizer *self)
 

Detailed Description

Tokenize a character string.

Author
Alan R. Rogers

This file implements a class that tokenizes a string. Usage is like this:

int ntokens; char buff[100]; Tokenizer *tkz = Tokenizer_new(maxTokens);

strcpy(buff, "my: input ; ; string"); ntokens = Tokenizer_split(tkz, buff, ":;");

The 3rd argument to Tokenizer_split defines the characters that separate tokens. Then you can access individual tokens like this:

char *token;

token = Tokenizer_token(tkz, 3);

The tokens themselves still reside in buff, so this array must not change until the next call to Tokenizer_split.

The memory allocated by Tokenizer_new is freed by Tokenizer_free.

The argument to Tokenizer_new determines the initial size of an internal array of pointers to tokens. If Tokenizer_split is handed a string with more than this number of tokens, it will re-allocate the internal array. It is legal to initialize with Tokenizer_new(0).

Function Documentation

◆ Tokenizer_find()

int Tokenizer_find ( Tokenizer t,
const char *  s 
)

Search for string s among tokens.

On success, return index of token. On failure, return the current number of tokens. After each call, the returned value should be compared with that of Tokenizer_ntokens.

◆ Tokenizer_split()

int Tokenizer_split ( Tokenizer self,
char *  buff,
const char *  sep 
)

Turn string "buff" into an array of tokens, assuming that tokens in the input string may be separated by any of the characters in string "sep".

Supress empty tokens. Return the number of tokens.

References strCountSetChunks().

Referenced by DAFReader_next(), and RAFReader_next().

◆ Tokenizer_strip()

int Tokenizer_strip ( Tokenizer t,
const char *  extraneous 
)

Strip extraneous chars (those list in "extraneous") from both ends of each token.

If the result is an empty string, this token is removed from the list. The function returns the number of tokens.

Referenced by DAFReader_next(), and RAFReader_next().