NLIDB Progress

7th April
Calculated cosine similarity using Tensorflow. Will upload pretrained glove word embeddings and use it for grouping the words. Too much work. Pythonanywhere whitelists sites for free accounts. Dont want to move to paid before I check out the features using lappi. Have to manually download and upload. Too much whitelisting.
8th April
Installed sling. It is working. Sling can be a substitute for framenet. This could be a pretty good parser (considering it does not involve supervised learning). Got the syntaxnet API running. NLIDB progress is looking good. Thanks Google.
9th April
Got started with full text search using whoosh. It should help with generic queries which can't be answered using databases.
10th April
Parsing in sling is progressing a bit slowly. The data structure is a bit weird. Have figured out most of the aspects of sling data structures. I feel depressed because of the weather. Dont feel like working anymore. Will work properly after I get a laptop. Kill US and UK people.
Need to build an ontology. Will do it tomorrow.
11th April
Will start testing on some open source databases. started working on ontology. I want to keep it portable. I want to to roll out the first version asap. It can be improved later. The main backend will be graph database.
Figured out the ontology. It has evolved into more than an NLIDB. Got bored of studying graph database. Will take a break.
First will make a basic version using SQL. Will move to graph database later. Google Sling is pretty good. I will write a tutorial on how to use sling. Captures proto agent, proto patient etc. quite nicely.
13th April
Have almost figured out the entire logic. Will complete the first draft by Monday.
14th April
Figured out the outline for first draft. Need to start writing the code. It would be of great help to nlp researchers if something like semantic concepts were mapped similar to word vectors (word2vec) model. Perhaps something like a semantic web can do something to this effect. I suppose an ngram model already does it to some extent. Is this data publically available?
It should be a pretty cool NLIDB considering it does not use any training data. Hopefully will complete it in the coming week. Delay in laptop arrival is screwing up things.
I think my method is a bit unique and has not been tried before. Looks good.
If you look at Sling carefully, it does a lot of things. It is a bit inconsistent though. I do not want to sound critical but this exposes some of the flaws of deep learning.
17th April
Made some progress on parsing sling output. However, there is a problem of frame id resetting if I loop through frames. If anyone knows, let me know. For now I am going to do regex. I have done a lot of wrangling but do not see any solution to the frame id resetting problem. Regex is the only solution. I will start building a framework using regex.
18th April
Started work on a rule based NLIDB. Designed and coded a very rudimentary system. Will try to build a nice one in 4 days. It will complement sling and syntactic parser.
20th April
I think I can build a good rule based NLIDB. Pieces are falling in place. It is a crude approach but I think I can make it work. I am taking small steps. No point in planning a grand system. Hopefully small steps will come together in the end. I think this will be a pretty good NLIDB. It will probably rival a lot of current NLIDBs.
21st April
The progress is fine but slow. Bad weather depresses me. Dont feel like working in bad weather. This could probably be one of the best unsupervised NLIDBs. The best takes time. I have realized that NLIDBs are not that difficult.
23rd April
Got a laptop. It has screwed up my net consumption. Have run out of data. Will focus on logic for the time being. Made some progress in logic. Small steps are coming together.
24th April
Word embeddings output looks decent. On a more thorough checking, turns out to be a bit crude. Not strong enough to be effective. Will have to use conceptnet or build my own ontology. Maybe use a different source. I think they hacked it in the morning and didnt allow me to check it just to have fun with me. After I did all the work, they let me check it. Motherfucking americans. Am trying with higher dimensions. Maybe the output will be better. Maybe the file and output are hacked. Results improve slightly when going from 50 to 100 dimensions. Maybe it will work fine with 300 dimensions. Maybe it can work. I will change the source for thw time being. Maybe use word2vec or fasttext. Lets see.
Compound words give error. This is not going to work. I will have to go to conceptnet or my own ontology. Further testing has shown that they are unreliable in 50 and 100 dimensions. Will check for 300 dimensions later. So much for the deep learning hype.
A minorsetback due to pretrained embeddings. No worries. My worst fear of word embeddings have been realized. The time taken for NLIDB completion will be a bit more.
This is for sure going to be one of the best unsupervised NLIDBs. No disrespect to the deep learning guys, but IMO it is not feasible to make a very high precision NLIDB using deep learning alone. Show me one.
Looks like I will have to do some sequence modelling. Some supervised learning is on the cards. I will have to change my earlier stance. Conceptnet or wordnet has to perform else this is going to be tougher than expected.
Conceptnet should give better embeddings as it uses a knowledge graph. Fingers crossed.
25th April
I have gone through all the standard approaches. IMO, my approach is pretty good. I need to make my codes more compact.
Sometimes I feel like a genius. Finding creative solutions to complex problems. Nucleus is behind schedule.
Working on numberbatch, conceptnet, and fasttext. Hopefully my workload will be reduced if these work. Maybe will train embeddings on ncbi data.
Checked numberbatch. I think pretrained embeddings will not work as the primary method. Maybe as a secondary way. focusing on conceptnet. if conceptnet doesnt work, my work will increase considerably.
Conceptnet looks promising. I have the flexibility of adding a few things of my own. This is definitely going to work. Till now looks pretty good.  I am finalizing conceptnet as one of the primary classification and scoring sources. Will do more research to check out its possiblities. There are some advantages of using conceptnet over own graph database.
NLIDB progress is looking good. I think I have got some good scoring and classification methods. Hopefully the word embeddings on ncbi data should come through and add more robustness and reliability to the system. Will take a 1 day break.
26th April
Lazing around today. I want a break from all the madness.
28th April
Last 2 days have not been productive. Didnt feel like working. Parsed some syntaxnet output. Additional scoring parameters to enhance accuracy. I think I will have to go for regex in sling. I was trying to avoid it but there seems no other way around it. The only option seems to go through sling's code. Regex seems easier.

24th 
Good progress albeit a bit slow. Was battling nihilistic tendencies. Also focusing on giving a bit of touches to weight loss site.

Comments

Popular posts from this blog

On Quora