31st Dec, 2019
The full form of NLP is Neuro-Linguistic Programming. It is a pseudoscientific approach to personal development, communication, and psychotherapy. The creators of NLP believe that there is an interconnection between neurological processes, language, and behavior that can be achieved through programming. In the medicine world, NLP can cure depression, phobias, illness related to psychosomatic problems, allergy, tic disorders, and learning disorders in a single session.
NLP performs on the metaphors of the operation of the brain. In other words, NLP is a source of programming to understand how people think and feel through their behavior and language patterns. It can also be employed for personal development.
Below are few major features of Nlp
Q1. Explain what you understand by NLP?
Expanded as Natural Language Processing, NLP is the process of manipulating the natural language such as speech and text by software. It is a field in machine learning that gives the computer the ability to understand, analyze, and manipulate the human language by using the algorithm. The history of the NLP can be traced back to the 1950s when Alan Turing proposed the Turing test to check the intelligence of A.I. Natural Language Processing is used extensively in today's world. Some real-time examples are Google translate, google search prediction, auto-correct in mobile keyboards, online chatbots, etc.
Q2. Enlist major components on NLP?
Natural Language Processing has five main components. They are,
Here, the sentences are analyzed to identify entities in it such as a person, place, organization, events, etc. The identified entities are clustered by their group, and an importance factor is assigned to it.
Here, the sentence is parsed to identify the relation between the words. Also, the grammar in the sentence is understood in this analysis.
After the sentence is analyzed to find relation and extract entity, the semantic analysis is performed to find the meaning of the sentence in a context free form.
This analysis is done to find a mood or attitude in the sentence. The sentence is analyzed to find polarity that is it finds whether the sentence is positive or negative. Magnitude is also calculated that assigns the weight for the polarity in the sentence.
Here the statement is analyzed based on the context using the preceding or succeeding sentences.
Q3. Enlist some real world applications of NLP?
There are many real-world applications of using NLP. Some of them are,
Gmail is filtering out the spam mail from the real ones using the NLP.
Google assistant, Amazon Alexa, and many other famous digital assistants are developed using the NLP.
Autocorrect in the smartphone keyboard, and autocomplete in google search are all using the NLP.
It is the main area where the NLP is used. Translation software's like Google translate, Microsoft translate uses the NLP heavily.
All the search engines use NLP to analyze the user-entered text and query exact results.
Q4. List some library for NLP?
Some of the popular NLP libraries are,
NLTK (Natural Language Toolkit) - Probably the most popular NLP library used by millions of developers around the world. Written in python, it has all the functionalities and features to do Natural Language Processing.
SpaCY – An optimized NLP library available in python and cython. The programs written in spaCY can be easily integrated with deep learning frameworks such as TensorFlow and PyTorch.
Pattern – A data mining library developed for python. It has various NLP tools to parse a variety of data sources from google, Facebook, twitter, etc.
TextBlob – Based on NLTK and Pattern the textblob has good API for common NLP operations.
Q5. Explain NLP Terminology?
Some of the common NLP terminologies are,
Q6. What is Lemmatization in NLP?
Lemmatization the process of converting a word into its base form. In NLP, lemmatization considers the context and converts the word into a meaningful base form. The converted word is called as lemmas. To get the correct lemma of a word, it is important to study the morphological analysis of each word and it requires dictionaries to do it. There are various libraries to do lemmatization such as wordnet in the NLTK, spaCY lemmatization, etc.
For eg: ‘caring’ is lemmatized into ‘care’.
Q7. Explain Latent Semantic Indexing in NLP?
LSA (Latent Semantic Analysis) or sometimes referred to as LSI (Latent Semantic Indexing) is the process of analyzing the relationship between documents and the words they contain by converting the document into a vector form. In the vector form, it is easy to find the relationship between the words by calculating the distance between them. The first step in LSA is to convert the terms in the document to its vector form by using a term frequency-inverse document frequency algorithm. Then, LSA uses the SVD (Singular Value Decomposition) technique to reduce the dimensionality of the vectors. Finally, a matrix is created containing rows with unique words and columns with documents to find the relationship between the documents.
Q8. What is text mining in NLP?
Text mining is the process of analyzing the unstructured textual data to gather valuable information from it. It incorporates various processes such as data extraction, machine learning, and statistics to find useful information from textual data. Here, the unstructured data is gathered first. Then, it is converted into a structured form by using machine learning algorithms. Then finally, useful information is gathered from it using statistics and text mining algorithms. This process is used in various places like social media analysis, customer care service, fraud detection, etc.
Q9. What is word embedding?
Word embedding in NLP is a modeling technique that maps the words from phrases into a vector. This process is done to improve the accuracy in sentiment analysis and syntactic parsing. There are many algorithms to convert a word into a real number such as GloVe, Word2Vec, Embedding layer, etc. Word embedding technique can’t represent the same words with multiple meanings as different vectors. That is, it conflates homonym words as sing vector.
Q10. What is Latent Dirichlet Allocation in NLP?
LDA (Latent Dirichlet Allocation) is a statistical technique to represent words and sentences in a document as a topic with a certain probability. The documents are represented as a mixture of topics with words having a certain probability. In this algorithm, it randomly assigns the set of predefined topics to the words in the document. Then it learns overtime to find the words matching a certain topic with good probability. It uses Natural Language Processing and Topic Modelling to find the topic for each sentence with a certain probability.
Q11. Explain difference between Lemmatizing and Stemming?
Some Difference between Lemmatizing Stemming
Stemming – It reduces the word by cutting off the beginning or end of the word. It usually achieves goals most of the time, but not all the time as it doesn’t use vocabulary or morphological analysis.
Eg: Studies -> Studi //here it doesn't reduce the word into correct base form.
Studying -> study //here it reduces into the correct base form of the word.
Lemmatization – It is same as the stemming but it takes morphological analysis and vocabulary into consideration. That is, it always converts the word into a correct base form.
Eg: Studies -> Study & Studying -> Study
Q12. What are distance-based classifiers?
Distance-based classification classified the objects based on the similarity or dissimilarity between them measured by the distance functions. In NLP, K nearest algorithm is used for text classification. It is a simple supervised algorithm that is used to group or classify data objects based on the distance between.
It uses the Euclidean distance algorithm to calculate the distance between the objects.
Q13. What is TF_IDF?
TFIDF (Term Frequency-Inverse Document Frequency) is a statistical method to find how important a word is to a document. It assigns weight to words based on the number of times the word is repeated in the document. The weight is offset by the number of documents that contain the word. It multiplies the term frequency (frequency of the word in a document) and the inverse document frequency (how rare/common a word across a set of documents) to calculate TF-IDF. Using TF-IDF for valuing the text is used in automated text analysis and for scoring the text in machine learning algorithms for NLP.
Q14. Describe dependency parsing in NLP?
Dependency parsing also called Syntactic parsing is used to assign a syntactic structure to a sentence. It assigns parse tree syntactic structure to the sentence. It is useful for checking the grammar and semantic analysis of sentences in the NLP. A sentence can have multiple parse trees because of its ambiguity. So, it makes the dependency parsing a complex task.
There are many libraries that provide dependency parsing like spaCY dependency parser, NLTK dependency parser, etc.
Q15. How to build ontologies?
There are several libraries in python to build an ontology. The OWL API Python library offers excellent support to build ontology from the text. FRED is another machine reader tool for Semantic wen that is used to build and design ontology from the word.
Q16. Enlist few tools for training NLP models?
Some of the popular tools for training the NLP models are,
Q17. What is a POS tagger?
POS (Parts of Speech) tagger is a software to categorize the text according to the part of the speech that is based on the context and the definition of the word. The tagger reads the word and assigns parts of speech such as nouns, verbs, adjectives, etc to the word. The POS tagging also termed as grammatical tagging or word-category disambiguation is done as a process in text analysis to find the hidden meaning of the text.
The algorithms used by the POS tagger fall into two types:- rule-based tagging and stochastic.
Q18. What is shallow parsing?
Shallow parsing is done to analyze the sentence to find its parts of speech such as nouns, verbs, adjectives, etc.
After finding the Parts of Speech, it then links groups it together to find grammatical meaning the text. It is similar to POS tagging, but it takes the POS tagging one step further to find verb groups, noun groups, etc. The shallow parsing is used in the Natural Language Processing heavily.
Q19. What is NLTK?
NLTK, expanded as Natural Language ToolKit is an open-source python library used for Natural Language Processing. Releases in 2001 by Steven Bird and Edward Loper, the NLTK is the most popular NLP package that supports a wide variety of algorithms and statistic methods to perform text analysis. It also has sample data to work with. NLTK is mainly used in research and teaching domains.
NLTK has support for text classification, stemming, tagging, and parsing.
Q20. Explain what is NLU and NLG?
NLU, expanded as Natural Language Understanding is a component in NLP that is used to understand the meaning of natural language. That is, it finds whether the language is in spoken form or in text form. It uses a POS tagger, parsers to find the meaning of the language and to build applications. It can be defined as the process of reading and interpreting the language. NLG, expanded as Natural Language Generation is a component in
NLP that is used to generate natural language. It uses POS tags, parsing results, and many others to generate the natural language using the machines. It can be defined as the process of writing or generating a language.
Q21. What is Bert?
BERT (Bidirectional Encoder Representations for Transformer) is an open-source NLP model developed by researchers at Google. It uses bidirectional training to learn about the text. After training the model with billions of sentences, the BERT has a good understanding of how sentences work.
BERT also makes use of the Transformer (an attention mechanism) to learn about the contextual relations between the text. BERT takes the Natural Language Processing to the next level and it created a big stir the machine learning community.