![]() |
|
#1
|
|||
|
|||
Doubt on Using English dictionary in Java..Hi..This is Sakthi.
Doubt on Using English dictionary in java.. i have to check whether the entered text is meaning ful or not .for that i have to include the english dictionary in my project.. Does we have any library in javaCould any one tell me procedure to process further .. Thanks in advance |
|||
|
#2
|
|||
|
|||
Re: Doubt on Using English dictionary in java..You can make your own dictionary, here's how you could do this.
If you Google "word list", you will find many sites with word lists for your future dictionary. Make your choice of lists for your application. Presort the list and read this all into an array of Strings. Use binary search or other means to do your word validation. The presorting is important so that your program does not have to do the sorting every time you execute. |
|
#3
|
||||
|
||||
Re: Doubt on Using English dictionary in java..Quote:
Yeah....umm, no. You don't read "this all into an array of Strings." If you have a large word list, you can easily exceed the default stack and heap sizes for the JRE. A better choice is an indexed word list that gives you some meaningful entry points into your list so that you can quickly seek to the offset into the file for the group of most similarly spelled words. Then you read a fixed or dynamic block size of words and compare them against the input. Some systems use multiple files to build their indexes. Others use single files. I'd probably use a sqlite database file and access it from within my Java app using a type 3 or 4 driver and then query the file using SQL for a block of words that closely matched the input. Reading everything into an array is rarely a good choice for an unknown set size. This is because arrays usually have to be bounded in size at compile time. Also, in this case, how do you know how many words are in the word list? Let's say that you at least went through the list and counted every distinct word. Do you do that everytime the program starts up? What if the dictionary feature of the app is rarely used? Have you wasted the processing and the resources for something that isn't needed? What do you do when the user wants to add a word to the dictionary? Do you keep it in a "user words" file? What happens when management decides that they absolutely LOVE the new app, but want to get it out to millions of users right now? We work the midnight shift, convert our app to an applet and then kill every browser on the planet as they download our 160MB dictionary file and try to load it into memory? ...all from the sandbox of the applet? Perhaps we provide a network connection back to a master database (file or RDBMS) and maintain user dictionaries via a login/authentication method and a separate table based on the login? We spend only a couple of megs, we make our app more extensible and we leverage the capabilities of SQL to provide features such as query (select), insert, update, delete...all so that our app can do something meaningful with the array other than a low-level: CPP / C++ / C Code:
...of course, you did mention "binary search" so, what might that require? We've got a UNICODE String in Java such that for every letter of the word we use String.getBytes() and then do a binary comparison of the bytes? Wouldn't we be better off if we used compareTo/compareToIgnoreCase or matches/regionMatches/startsWith or some other "higher level" API than direct binary byte manipulation? Just something to think about.... MxB |
|
#4
|
|||
|
|||
Re: Doubt on Using English dictionary in Java..Hello Bob,
I agree with everything you say if that is what the OP wants. I may have read it wrong, but the OP appears to be in the middle of a project that he prefers to validate the words before processing them. The dictionary is not the project, it is to sweeten the project. There is a good chance that the efforts to make "a sqlite database file and access it from within my Java app using a type 3 or 4 driver and then query the file using SQL for a block of words that closely matched the input" would surpass the efforts to do his project itself. This is why the simplest possible approach was proposed. If he is to design a system for an enterprise, as implied by "What happens when management decides that ...", and he omits to mention it, then he only has to amend his post with the objections, and perhaps give a better view of the environment he is working with. I completely agree with "Reading everything into an array is rarely a good choice ". but when this gets the work done, I would not hesitate to do it. Afterall, the unused stacks and heaps are sitting there costing nothing. If it is a major project where memory is of the essence, the OP would or should have mentioned it. I know this will not please the purists, but I would deliver.. and at the promised time. |
|
#5
|
|||||||
|
|||||||
Re: Doubt on Using English dictionary in Java..Quote:
Based on the content of the OP's post, I think that it is very difficult to deduce any great detail beyond adding English dictionary capability to a Java program. The scope, detail, duration, schedule, etc. of the project are undefined in the post. It appears to me to be a question of architectural interrogation. Quote:
While using the pronoun "he" is still accepted in literature and text, it may be particularly inappropriate in this case. Shakti is the concept, or personification, of divine feminine creative power, sometimes referred to as 'The Great Divine Mother' in Hinduism. --Wikipedia. Again, I don't think that you can be as assuming about the OP's preferences from this post alone. If there was a different thread that also discussed the project, I didn't read it, so maybe I'm missing information. However, as I read the OP's toplevel post in this thread, we have only dictionary + English + Java + doubt Quote:
I don't know why you feel that the difficulty of using a sqlite database would be so challenging or how the scope of the project relates to the seeming complexity of using it. I think that it would take me perhaps 30 minutes to implement plus perhaps another 20 to convert a known wordlist of mine to a sqlite database. If I round up to an hour and even add another hour, I don't think that I'm wasting time exploring that option compared to the relative challenges of adding even basic search capabilities to a literally "dumb" array. Quote:
I don't mean to be a devil's advocate, but I don't believe that the proposed approach was truly simplest. It may be the easiest that you can consider, but I'd think that the minimalist "reasonably considerable" storage facility would be a Vector. Its dynamic growth, random accessibility, rudimentary search capabilities make it overwhelmingly more beneficial than a "dumb" array that basically knows only its size. Quote:
I don't think that you can discount that possibility (enterprise system design) now, however, that wasn't the point that I was making. What I was making was the point that a flexible storage system that offers advanced interfacing capabilities allows for expanded requirements of the system. In "the real world" "management" always expands requirements as soon as they see something working that they like. It doesn't hurt to be reasonably prepared for the eventuality while also selecting the appropriate storage facility that allows you to focus on implementing the business rules rather than low-level management of the storage. Quote:
I agree absolutely, but I don't think that this really gets the work done in this case. Granted, given the details of the post (or lack thereof), we can't really decide what the requirements are for the project. Therefore, I may have over-engineered them while you perhaps over-simplified them. Perhaps some middle-ground is closer to the OP's needs? Who knows? Quote:
The notion that unused stacks and heaps cost nothing is probably just plain wrong. If nothing else, they must contribute to overhead in startup time. With memory speeds in the 533-667MHz and faster ranges, one could argue that it doesn't really matter, particularly if "real" memory is never accessed, only virtualized. And, what is a few microseconds between friends when we have a 3GHz x86 spewing bits around like a banshee? But what if we're operating in a cellphone or other resource constrained environment? I disagree with the fundamental premise that your array will allow you to deliver on time, particularly in light of the lack of requirements. What I think that you're doing is you're over-simplifying "what" a dictionary is and how a representation of "what" a dictionary is can be adequately represented by an array or words. If we assume that the OP only wants to qualify a word for its spelling and/or existence in the dictionary, then perhaps your array is at least a functional choice, but still one of incredibly reduced capability compared to the Vector class. If we assume that the OP wants what could be defined as a typical printed English dictionary "entry," then an array of words becomes exceedingly problematic in solving even the fundamental needs for the storage. You therefore end up with an array of class objects that each represent an entry into the dictionary with no real capability of the storage itself. Still, a Vector would be far superior. Obviously, without even a basic set of requirements, we could never really know what the OP wants. However, I hope that we both now feel that the array is probably not the right choice regardless of what the minimalist set of requirements are, even if it is only to validate the existence of the word in the dictionary and/or to offer nearby close matches to the input. While you're still writing your for loop logic, I'll have completed: v.contains(word); ...not sure who's going to be "on time" in this deal, but I can guess that it will be the one who uses the most powerful API that requires the least amount of code to write that implements the minimum requirement for the project. MxB |
|
#6
|
|||
|
|||
Re: Doubt on Using English dictionary in Java..Thank you for the excellent information you have provided on helping people at this forum. It would be invaluable.
|
|
#7
|
||||
|
||||
Re: Doubt on Using English dictionary in Java..Quote:
You're right, it would be invaluable if only there was some code to go along with it! JAVA Code:
Output: Code:
...but I'm not much of a Java programmer these days, so somebody with better/more recent skills should be able to write much better code. Obviously, one would want to conserve connections and such, and a lot of the code is duplicated, but it was easy to cut-n-paste for those functions that needed to access the database (all but main?). It took me perhaps 4 minutes to create a file with "insert" statements from a wordlist that I had with each word on its own line already. It took sqlite about 3 hours (I let it run overnight) to read in the 420+ thousand words from the file. It took me about 40 minutes to write the code and debug what I have of it using my "advanced IDE" aka "vim" over an ssh connection to a Linux box using gij. Probably 10 minutes of that time was trying to figure out my "-cp" (classpath) to get it to load the sqlitejdbc jar file properly...not having done it since probably v1.1.3 of the JDK. Another few minutes to ssh the files from my Linux box to my MacBook and then write this reply. My point is that it took about 50 minutes in total to come up with a fundamental set of interfaces that would and do support a relatively unlimited set of capability with regard to "words" in the "list" and it is extensible. I can easily add greater querying strength to try to better match what a user might be looking for, such as a word that is five letters long and contains two vowels. Imagine trying to do that using an array or even a Vector. Granted, I am packing around a rather hefty (~2MB) jar file for the JDBC driver package and a ~17MB (uncompressed) database file and that alone may be a deal breaker, particularly if we're in an embedded system. However, compressing the database into a jar file with the driver jar takes only ~3MB in total. Who knows if we'd have the RAM available for the execution environment? MxB |
Recent GIDBlog
Toyota - 2009 May Promotion by Nihal
| Thread Tools | Search this Thread |
| Rate This Thread | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| How to import English dictionary function in Visual C++ | SakthiGs | MS Visual C++ / MFC Forum | 1 | 18-May-2009 08:42 |
| .NET, Java, or C++? | ironspider | Miscellaneous Programming Forum | 1 | 21-Apr-2009 01:22 |
| To post messages / click Buttons of a Java Jar App using code | Jun0 | C Programming Language | 1 | 06-Jan-2007 15:44 |
| Scalability in Java and C++ | agx | Miscellaneous Programming Forum | 7 | 04-Feb-2006 16:35 |
| D3D error when running fullscreen games/programs - help?! | daa709 | Computer Hardware Forum | 4 | 01-Jul-2005 09:03 |
Network Sites: GIDNetwork · GIDWebHosts · GIDSearch · Learning Journal by J de Silva, The