Legofit
infers population history from nucleotide site patterns.
|
Tokenize a character string. More...
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <assert.h>
#include "misc.h"
#include "tokenizer.h"
Functions | |
Tokenizer * | Tokenizer_new (int maxTokens) |
Tokenizer constructor. | |
void | Tokenizer_free (Tokenizer *self) |
Tokenizer destructor. | |
int | Tokenizer_split (Tokenizer *self, char *buff, const char *sep) |
Turn string "buff" into an array of tokens, assuming that tokens in the input string may be separated by any of the characters in string "sep". More... | |
char * | Tokenizer_token (Tokenizer *t, int ndx) |
Return pointer to token with given index. | |
int | Tokenizer_strip (Tokenizer *t, const char *extraneous) |
Strip extraneous chars (those list in "extraneous") from both ends of each token. More... | |
int | Tokenizer_ntokens (Tokenizer *t) |
Return number of tokens. | |
int | Tokenizer_find (Tokenizer *t, const char *s) |
Search for string s among tokens. More... | |
void | Tokenizer_printSummary (const Tokenizer *tkz, FILE *ofp) |
Print a summary of the information in a Tokenizer. | |
void | Tokenizer_print (const Tokenizer *tkz, FILE *ofp) |
Print Tokenizer object. | |
void | Tokenizer_clear (Tokenizer *self) |
Tokenize a character string.
This file implements a class that tokenizes a string. Usage is like this:
int ntokens; char buff[100]; Tokenizer *tkz = Tokenizer_new(maxTokens);
strcpy(buff, "my: input ; ; string"); ntokens = Tokenizer_split(tkz, buff, ":;");
The 3rd argument to Tokenizer_split defines the characters that separate tokens. Then you can access individual tokens like this:
char *token;
token = Tokenizer_token(tkz, 3);
The tokens themselves still reside in buff, so this array must not change until the next call to Tokenizer_split.
The memory allocated by Tokenizer_new is freed by Tokenizer_free.
The argument to Tokenizer_new determines the initial size of an internal array of pointers to tokens. If Tokenizer_split is handed a string with more than this number of tokens, it will re-allocate the internal array. It is legal to initialize with Tokenizer_new(0).
int Tokenizer_find | ( | Tokenizer * | t, |
const char * | s | ||
) |
Search for string s among tokens.
On success, return index of token. On failure, return the current number of tokens. After each call, the returned value should be compared with that of Tokenizer_ntokens.
int Tokenizer_split | ( | Tokenizer * | self, |
char * | buff, | ||
const char * | sep | ||
) |
Turn string "buff" into an array of tokens, assuming that tokens in the input string may be separated by any of the characters in string "sep".
Supress empty tokens. Return the number of tokens.
References strCountSetChunks().
Referenced by DAFReader_next(), and RAFReader_next().
int Tokenizer_strip | ( | Tokenizer * | t, |
const char * | extraneous | ||
) |
Strip extraneous chars (those list in "extraneous") from both ends of each token.
If the result is an empty string, this token is removed from the list. The function returns the number of tokens.
Referenced by DAFReader_next(), and RAFReader_next().