Fastest way to count the number of occurrences of a string
NickName:ludo Ask DateTime:2012-05-12T18:25:48

Fastest way to count the number of occurrences of a string

I was wondering what is the fastest way to count the number of occurrences of a string (needle) within another string (haystack). The way I'm doing it is:

int findWord(char * file, char * word){
 char *fptr;
 char * current = strtok_r(file, " ,.\n", &fptr);
 int sum = 0;
 while (current != NULL){
    //printf("%s\n", current);
    if(strcmp(current, word) == 0)
        sum+=1;
    current = strtok_r(NULL, " ,.\n", &fptr);
 }
 return sum;
}

Would it be faster to use a more complex algorithm (Boyer-Moore)? Thanks

Copyright Notice:Content Author:「ludo」,Reproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article:https://stackoverflow.com/questions/10562887/fastest-way-to-count-the-number-of-occurrences-of-a-string

Answers
NPE 2012-05-12T10:36:31

\n Would it be faster to use a more complex algorithm (Boyer-Moore)?\n\n\nIn your algorithm, the unit of comparison is a word rather than a character. This enables the algorithm to ignore matches that straddle a word boundary, and thus makes it run in O(n) time.\n\nI doubt you'd be able to beat that asymptotically.\n\nAs far as lowering the multiplicative constant, right now your algorithm looks at every character in file twice. You can eliminate that redundancy by rewriting the code to use a pair of pointers and a single for loop (figuring out the details is left as an exercise for the reader :))",


Sergey Kalinichenko 2012-05-12T10:35:56

Currently, if your program is counting word \"blah\" and encounters a token is \"blahblah\", your algorithm counts it as zero occurrences. If it needed to count it as two, you cound benefit from a more advanced approach.\n\nIf your program does what you want, you are processing as fast as you can: it is already linear in the number of letters of the longer \"word\", so you cannot speed it up further.\n\nAn even more interesting solution would be required to count words with self-aliasing: for example, count \"aa\"s inside \"aaaa\" string. If you needed to return 3 for this situation, you'd need a lot more advanced algorithm.",


More about “Fastest way to count the number of occurrences of a string” related questions

Fastest way to count the number of occurrences of a string

I was wondering what is the fastest way to count the number of occurrences of a string (needle) within another string (haystack). The way I'm doing it is: int findWord(char * file, char * word){ ...

Show Detail

Fastest way to count number of occurrences in Pandas

What is the fastest way to compute the number of occurrences of elements within a Pandas series? My current fastest solution involves .groupby(columnname).size(). Is there anything faster within ...

Show Detail

fastest way to count occurrences of partial list

What is the fastest way to count an element occurrences from a start position till a stop position. list = [a,b,c,c,d,c....] can be very long count(list,c, from = 2, till = 4) = 2. we could do

Show Detail

Fastest way to count the occurrences of each unique column in a matrix in R

I'm new to R (and to stackoverflow) and I would appreciate your help. I would like to count the number of occurences of each unique column in a matrix. I have written the following code, but it is

Show Detail

Is there a faster way to count non-overlapping occurrences in a string than count()?

Given a minimum length N and a string S of 1's and 0's (e.g. "01000100"), I am trying to return the number of non-overlapping occurrences of a sub-string of length n containing all '0's. For example,

Show Detail

Fastest way to count number of occurrences in a Python list

I have a Python list and I want to know what's the quickest way to count the number of occurrences of the item, '1' in this list. In my actual case, the item can occur tens of thousands of times wh...

Show Detail

Count number of occurrences for each char in a string

I want to count the number of occurrences of each character in a given string using JavaScript. For example: var str = "I want to count the number of occurrences of each char in this string&qu...

Show Detail

Fastest way of putting array elements number of occurrences between limits

For example... I have an array of integers which is initialized by random values between 1 and 1000 Array has 1M elements (it will probably have much more, but this is only the example) Number of ...

Show Detail

Emacs regexp count occurrences

I'm looking for the fastest routine (not interactively) to get the number of matches of a regexp in a string. Something like (count-occurrences "a" "alabama") => 4

Show Detail

ES6 / lodash count the number of occurrences of a character in a string

I want to count the number of occurrences of a character in a string. This stack overflow post does that using ES5 but not ES6 or Lodash: Count the number of occurrences of a character in a strin...

Show Detail