After crawled the data, then sorting it in clever way in certain data structure before designing the search

When there’s huge amount of data in the database, it’s important that the data is sorted in a clever way to make it easier to mine relevant search result matching for user’s keywords.


Search Engine bot developers have to make decisions about usable data sorting algorithm while bot is on the run and collecting new data simultaneuosly sorting, and collecting it.
When collected data is sorted in some data structure we have to make decision about searching algorithm we’re using for optimal time used for get best matching results for the keyword and with less effort.


Typical datastuctures in the Computer Sciences are:

Stack – You can think of stack of plates at your local lunch place. Works in Last In First Out -way. (LIFO)
Queue – Like a queue on the grocery store, works First in and First out (FIFO) -methodical way.
Tree – Tree got parent branch and parent branch got child branches. There’s a logical order how to go through the tree when we have to find data.

As I mentioned couple of posts earlier, Binary tree might be clever way to collect data into because of it’s fast go-through if there’ve been good logic in sorting the data.

Sorting algorithms in theory (Couple of online Resources. Find out more by using Google!”

https://en.wikipedia.org/wiki/Sorting_algorithm
https://brilliant.org/wiki/sorting-algorithms/

After we learnt the sorting… then we hit for the searching…
https://en.wikipedia.org/wiki/Search_algorithm

Mainokset

Am I Losing my Religion?

Summer has started with feeling down myself. I¨ve more played online poker and played Rage 2 First-person shooter -video game with my Laptop on High Performance resolution than programmed. Lonely nights for the long long Summer. No money or girlfriend. Everybody doesn’t find girlfriend.

There’s extra excitement in my life for waiting studing place results. I applied for the Work Informatics Master studies at the Turku School of Ecomonics down on the University of Turku. Results will be clear in three weeks since this weekend.

Dropping out Java 8 SE for the grow and developement

As this project blog and procet followers have read, I have been written not to use any external third-party libraries or frameworks. There’s coming radical change for that.

My Contributor inspired me to use Java/Maven/Springboot -tehnology repertuare and to use jTestUnit for the Final Build. If it will be even the Build some day. I accept that.

I made my Bachleor in Business and Administration in Information Technology thesis appealing for WordPress -Content Management System and creating responsive theme using Twitter Bootstrap Framework, but I’ve never been in Java Framework -world. Now it’s time to break this illusion out of my life.

It’s more clever to try to develope build what you can run on server instead of making some Standard Java code by inventing the Cycle again with no use after all. Code looks like I’ve been drunk while coding it. I’ve not been drunk while coding. You can’t think in algorithmical way an code object-oriented approached software if you are drunk, in My Opinion.

Even if this never will be the Real Business, or even if I never start up running Linux-server for starting the robot engines to collect and mine data deeper and deeper from the Internet, perhaps I might be happy some day if working build-release is released.

If this project grow up for real business some day, the LTD-marked company for the international markets costs 2500 in euros, but there’s lot of byrocrachy and additional fees for the have a chance for my first employer at the moment based on Finnish Law. Finnish Goverment is blinking out the green light for the Technology Export companies or it should be. Finalnd is small nordic country with only five million citizens and we got lot of professinaolity and talent to conquer the world with power of us.

I like small companies, and I’m member of the Left Alliance party, and we drive for the green lights what comes to hiring first employer for less effort and less costs for the employee.

I got contacts with my past life and my past studying mates, who are skilled in programming, network-security -issues and as law-issues as accounting. Finland is safe country to be enterpeurer in my experience but you have to have killer-product if some Start-up- inspired company you are founding.

Technology is something you can patent even there’ve been discussion at the EU-area for the software-patenting and present Finnish president Sauli Niinistö has taken part of this discussion. I think he is in positive attitude for the software patents inside the EU-area and that’s our man’s talk. Like my defence lawyer is my man down under, when it comes to business-related topics, Sauli Niinistö have been good leading chairman of Finland. Cheers for Sauli!

Tuned up the SkeletonMiner.java – Data Type

After releasing my SkeletonMiner.java and Petri’s review and comments at the Facebook, I made some changes up to there.

I replaced all the ”StringBuilder” -Date Type -parameters with StringBuffer-typed, and added the used Data Type for all the ArrayList -data types (<String>).

Additionally after programming simple replacement for the StringBuilder, I programmed my own method for the creating the String from the given CharacterList. These changes are now up to date there on this project’s Github-profile in the ”SkeletonMiner.java” -Data Type implementation.

Now there¨s three of useful reusable String-related methods for any program down there, and I pushed the ”StringLibrary.java” -source code to my personal Github-account into ”Java-Stuff” -folder just for fun. Please feel free to take advantage on them!

I wrote new posting to my Computer-related blog too at the https://tietokoneblogi.wordpress.com (or http://www.tietokoneblogi.net in a shortly way with my oldest Domain Parking address till this day (Been my Blog Domain parking for my Computer-related blog since it’s birth moment over last five years now in a row till the end.)) appealing for the new String code library birth-moment.

Getting outcoming links out from the source code for the next step

Today I added new SkeletonMiner.java -Data Type to this project’s Github-profile. At least I can get outcoming links from the URL Source code for the next step.

There’s couple of reusable methods for any programs: Clearing Stringbuffer by ”clearBuffer()” -method, and reversing the given parameter String returning the String reversed.

I’m waiting for the Petri’s review and he probably throws out the question appealing for misundestandable method for the parsing the rest of the document out. Wasting of memory is too in the line of fire because I know that Less is More what comes to memory using. Every time when creating new reference to the memory for object’s location by using ”new” -keyword memory will be used and there’s lot of time-consuming code lines down on the SkeletonMiner.java -class.

When taking the deeper view at the two parsing methods, code is similar to both of them so there’s a work to do to decrease the amount of methods down to one parsing method that it would be more reusable. Deeper look inside these two parsing-methods and comparing them to each other we’ll see that there’s is repeated code lines. Repeating code is against the programming theory and resubale code is the other thing I keep in my mind always when it’s possible. Apologies for that.

Some updates to the Blog sites

After answering for the Petri’s comment and posting the ”Inventing the Cycle” -post Petri sent me Facebook -friend invitation and I asked him to join me as Contributor to this blog. I added ”Authors” -page down there and I suppose that Petri will accept my Contributor invite and add his own killer-introduction for the credits soon.

Additionally I increased amount of text on the ”About page”. Now there’s enough background information about purpose of this project.

Inveting the Cycle again

Petri Airio commented on my last post that I could use jsoup -library for easier Document-handling. This means and reveal that I´m inventing the cycle again.

I started from stracth with empty document in Eclipse Kepler IDE -developement environment I´m using and using Java Standard Edition -version 8 without nor additinal libraries, frameworks neither any other software solution programmed for make developer´s life easier. Usually real-life companies use ready-to-use libraries, frameworks and other solutions to take advantage on making their way to stand on the top with less effort.

My goal is not to make rocket science. I know the complexity of well-developed search engines like Google for ie. Topics like Artifical Intelligence, Machine Learning, and Text and Data mining are huge academic research fields and I´m not expert what comes to these topics.

Approached to use standard Java Code in this kind of project is good excercise to maintain practical Java Programming skills and get some exoerience in different kind of problems when I have to find a way to solve them.

If ready-to-use SE-bot starts collect data 24/7 some day there´s some data storage problems too because of huge amount of data available. Time complexity is only one challenge. But using for ie. JSON data storage -solution and if the data is stored at the tree structure, it´s not so time-consuming process to go through the binary-tree when calculating the matches for users´keywords from collected data in my opinion.