Do you have a blog, that lacks search engine that could find in content of your article where you wrote about this and that ? Let's check if Elasticsearch can help with it. Check out!
S0-E22/E30 :)
Elasticsearch store your pelican posts
This blog uses Pelican engine that uses either a reStructuredText or Markdown files which then are transformed into a html with some theme.
Because of that, you can't have a search engine on html files online. Fortunately you can use Elasticsearch, push your rst/md Files and search for specific words in it :)
Let's check how to achieve this.
Making a Search within your own .md files.
If you are so lucky to have some text files, you can start off. For my prof of concept purpose we'll use github repo from pelican blog.
Let's make a script that would put data from md/rst files into Elasticsearch.
Ofcourse let's don't repeat, and reuse what we have learned from yesterday's elasticsearch python client
Listing files with extension in python
import os
from glob import glob
def list_files(path, fileextension):
return [y for x in os.walk(path) for y in glob(os.path.join(x[0], '*.{}'.format(fileextension)))]
Putting file context into elasticsearch
def import_rst_files():
all_rst_files = list_files(mypath, "rst")
docs = []
for rst_file in all_rst_files:
client = Elasticsearch('localhost')
# Readfile content:
content = read_file_content(rst_file)
doc = {
"_index": "blogpost-{}".format(datetime.datetime.now().strftime("%Y-%m-%d")),
"_type": "blogpost",
"_id": rst_file,
"_source": {
"author": "PelicanblogAuthors",
"content": content,
}
}
docs.append(doc)
helpers.bulk(client, docs)
def read_file_content(filename):
return open(filename, 'r').read().decode('utf-8')
Making a very trivial searching
def search_in_elastic(phrase_to_search="blog"):
from elasticsearch_dsl import Search
client = Elasticsearch('localhost')
s = Search(using=client).query('match', content=phrase_to_search)
response = s.execute()
return response
def print_found(search_response):
print pprint.pformat(search_response.hits.hits)
Execution
import_rst_files()
found = search_in_elastic()
print_found(found)
This will list files, read them, put into elasticsearch and then output all found blog-posts with 'blog' phrase.
Bonus
You can find the code listed above in this repository.
Acknowledgements
- Elasticserach with Django the easy way
- Recursive Sub folder search nad return files in a list python
- How do i list all files of a directory
Thanks!
That's it :) Comment, share or don't :)
If you have any suggestions what I should blog about in the next articles - please give me a hint :)
See you tomorrow! Cheers!
Comments
comments powered by Disqus