Java string permutation and combination search

I am writing an Android word application My code includes a method that can find all combinations of strings and substrings of 7 strings with a minimum length of 3 Then compare all available combinations with each word in the dictionary to find all valid words I'm using recursive methods This is the code

// Gets all the permutations of a string.
void permuteString(String beginningString,String endingString) {
    if (endingString.length() <= 1){
        if((Arrays.binarySearch(mDictionary,beginningString.toLowerCase() +   endingString.toLowerCase())) >= 0){
            mWordSet.add(beginningString + endingString);
        }
    }
    else
        for (int i = 0; i < endingString.length(); i++) {
            String newString = endingString.substring(0,i) + endingString.substring(i + 1);
            permuteString(beginningString + endingString.charAt(i),newString);
      }
}
// Get the combinations of the sub-strings. Minimum 3 letter combinations
void subStrings(String s){
    String newString = "";
    if(s.length() > 3){
        for(int x = 0; x < s.length(); x++){
            newString = removeCharAt(x,s);
            permuteString("",newString);
            subStrings(newString);
        }
    }
}

The above code works normally, but when I installed it on my nexus, I realized it was a little too slow It takes a few seconds to finish About three or four seconds is unacceptable Now I have played some word games on my mobile phone. They will immediately calculate all combinations of a string, which makes me believe that my algorithm is not efficient and can be improved Who can help?

public class TrieNode {
TrieNode a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z;
TrieNode[] children = {a,z};
private ArrayList<String> words = new ArrayList<String>();

public void addWord(String word){
    words.add(word);
}
public ArrayList<String> getWords(){
    return words;
}
}
public class Trie {

static String myWord;
static String myLetters = "afinnrty";
static char[] myChars;
static Sort sort;
static TrieNode myNode = new TrieNode();
static TrieNode currentNode;
static int y = 0;
static ArrayList<String> availableWords = new ArrayList<String>();

public static void main(String[] args) {

    readWords();
    getPermutations();
}
public static void getPermutations(){
    currentNode = myNode;
    for(int x = 0; x < myLetters.length(); x++){
        if(currentNode.children[myLetters.charAt(x) - 'a'] != null){
            //availableWords.addAll(currentNode.getWords());
            currentNode = currentNode.children[myLetters.charAt(x) - 'a'];
            System.out.println(currentNode.getWords() + "" + myLetters.charAt(x));
        }
    }
    //System.out.println(availableWords);
}
public static void readWords(){
    try {
        BufferedReader in = new BufferedReader(new FileReader("c://scrabbledictionary.txt"));
        String str;
        while ((str = in.readLine()) != null) {
            myWord = str;
            myChars = str.tocharArray();
            sort = new Sort(myChars);
            insert(myNode,myChars,0);
        }
        in.close();
    } catch (IOException e) {
    }
}
public static void insert(TrieNode node,char[] myChars,int x){    
    if(x >= myChars.length){
        node.addWord(myWord);
        //System.out.println(node.getWords()+""+y);
        y++;
        return;
    }
    if(node.children[myChars[x]-'a'] == null){
        insert(node.children[myChars[x]-'a'] = new TrieNode(),x=x+1);
    }else{
        insert(node.children[myChars[x]-'a'],x=x+1);
    }
}
}

Solution

In your current method, you are looking for each permutation of each substring So for "ABC", you need to find "ABC", "ACB", "BAC", "BCA", "cab" and "CBA" If you want to find all the permutations of the permutation, you will find nearly 500000 times before you even view its substrings However, by preprocessing the dictionary, we can reduce it to one lookup, regardless of length

The idea is to put every word in the dictionary into some data structure, where each element contains a set of characters and a list of all words containing (only) these characters So, for example, you can build a binary tree that will contain a node containing (sort) the character set "abd" and the word list ["bad", "DAB"] Now, if we want to find all the permutations of "DBA", we sort them as "abd" and find them in the tree to retrieve the list

As Bauman points out, tries is well suited for storing such data The advantage of trie is that the lookup time depends only on the length of your search string - it has nothing to do with the size of your dictionary This structure is ideal because you will store a considerable number of words and most search strings will be small (most will be 3-character substrings at the lowest level of your recursion)

In this case, the path of your stunt will reflect the character set rather than the word itself Therefore, if your entire dictionary is ["bad", "DAB", "cab", "cable"], your search structure will be as follows:

There is a time / space compromise to achieve this In the simplest (fastest) method, each node contains only a word list and a child node [26] This allows you to find your child regularly. Just look at the child [s.charat (I) – 'a'] (where s is your search string and I am the depth of your current stunt)

The disadvantage is that most of your children's arrays are empty If space is a problem, you can use more compact representations, such as linked lists, dynamic arrays, hash tables, etc However, these costs are potentially multiple memory accesses and comparisons per node, rather than simple array access However, I would be surprised if the wasted space exceeds your entire dictionary by more than a few megabytes, so the array based method may be your best choice

Using trie, your entire permutation function will be replaced by a lookup, from O (n! Log D) (where D is the size of the dictionary, the size of your string, n) to o (n log n) (because you need to sort characters; the lookup itself is O (n))

Editor: I have implemented one (untested) implementation of this structure: http://pastebin.com/Qfu93E80

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>