Web Deployment of Genomics Machine Learning Models Using Flask Web Framework
1. Machine Learning Models Deployment and Maintenance
2. Using Flask Web Framework to Develop Machine Learning RESTful Web APIs
3. DNA Sequence String Validation
4. Applying Long Short-Term Memory Networks (LSTM) for DNA Sequence Classification
5. Single DNA Sequence Binary Classification Using Web UI
6. Multiple DNA Sequences Binary Classification Using a CSV File
7. Conclusions
1. Machine Learning Models Deployment and Maintenance
The latest phase of a Machine Leaning project workflow is to deploy the final model into production environment to make practical business decisions based on selected data. The Machine Learning models’ deployment and maintenance is provided for Machine Learning Engineers. They are IT developers in charge of researching, designing, building and maintaining Artificial Intelligence applications to automate Machine Learning models deployment.
There are 3 Ways to Deploy Machine Learning Models in Production today. Let’s look at each of them:
1. Web Services — design and develop Web APIs. In general, today Web APIs are implemented using RESTful services or gRPC microservices. “gRPC is roughly 7 times faster than REST when receiving data and roughly 10 times faster than REST when sending data for this specific payload. This is mainly due to the tight packing of the Protocol Buffers and the use of HTTP/2 by gRPC” (gRPC vs. REST: How Does gRPC Compare with Traditional REST APIs?).
2. On-demand batch predictions — implement Machine Learning offline models. These models can be optimized to handle a high volume of job instances and run more complex models. However, in standard batch processing framework, these models may require multiple development stages. In general, batch prediction is useful when we need to generate predictions for a set of observations all at once, and then make decisions on a certain percentage or number of the observations.
3. Embedded models in edge and mobile devices — application design and development on edge devices such as mobile and IoT have become very popular today. As we know, based on the hardware of the edge devices such as mobile and IoT devices, the computation power and storage capacity are limited. In these cases, the Machine Learning models will need to be simple using techniques such as quantization and aggregation while maintaining accuracy.
2. Using Flask Web Framework to Develop Machine Learning RESTful Web APIs
Flask is a web application microframework written in Python. It was developed by Armin Ronacher, who led a team of international Python enthusiasts called Poocco. Flask is based on the Werkzeg WSGI toolkit and the Jinja2 template engine. Both are Pocco projects. Flask is classified as a microframework because it does not require particular tools or libraries. It has no database abstraction layer, form validation, or any other components where pre-existing third-party libraries provide common functions. Flask supports extensions that can add application features as if they were implemented in Flask themselves. Extensions exist for object-relational mappers, form validation, upload handling, various open authentication technologies and several common framework related tools. In my blog paper Using C# to call Python RESTful API Web Services with Machine Learning Models, I explain hoe to build Python RESTful API web services with ML trained models allows any C# .NET business application to consume them for production Data Analytics projects implementation. Feel free to read, I think you’ll learn a lot about RESTful API Web Services and JSON data class encapsulation.
In this paper, Flask is used to develop simple genomics RESTful Web APIs. The following genomics APIs were developed for DNA sequence binary classification.
@app.route("/api/v1/getdnaclassapi", methods=["POST", "GET"])
def getdnaclassapi():
""" classify a single DNA sequence using generic API procedure
Returns:
dnaclass: DNA sequence binary class (0 or 1)
"""@app.route("/dnasequence", methods=["GET"])
def dnasequence():
""" browse deep_learning_genomics_webpage.html page for single DNA sequence classification
Returns:
deep_learning_genomics_webpage.html: web page
"""@app.route("/dnafile", methods=["GET"])
def dnafile():
""" browse deep_learning_genomics_webpage_browse.html page for multiple DNA sequence classification using a CSV file
Returns:
deep_learning_genomics_webpage_browse.html: web page
"""@app.route("/api/v1/getdnaclasshtml", methods=["POST", "GET"])
def getdnaclasshtml():
""" classify a single DNA sequence using deep_learning_genomics_webpage.html page
Returns:
deep_learning_genomics_webpage.html: web page for a single DNA sequence classification
"""@app.route("/api/v1/getdnaclasshtmlbrowse", methods=["POST", "GET"])
def getdnaclasshtmlbrowse():
""" classify multiple DNA sequences with a CSV file using deep_learning_genomics_webpage_browse.html
page
Returns:
deep_learning_genomics_webpage_browse.html: web page for multiple DNA sequences classification
"""
The following functions are part of the genomics RESTful APIs library calls. These functions are used to implement specific Machine Learning algorithms.
def dna_sequence_validation(dna_sequence):
"""use the following functions to validate a DNA sequence: is_string_empty_none(dna_sequence),
is_dna(dna_sequence) and len(dna_sequence)
Args:
dna_sequence: a DNA sequence
Returns:
_true or false
"""def get_dna_class(dna_sequence_type, *dna_sequence_data):
"""define what type of DNA sequence data fram for single and multiple DNA sequence classification
Args:
dna_sequence_type: single DNA sequence or CVS file
dna_sequence_data: dna sequence data entry type
Returns:
dna_string: classified binary DNA sequence (0 or 1)
"""def apply_ml_model_clasifier(df_dna_sequence):
"""classify a DNA sequence dataset (single or multiple) using a machine learning model
Args:
df_dna_sequence: DNA sequence pandas data frame
Returns:
predicted_dna_sequence: single DNA sequence or DNA sequence array (multiple)
"""
3. DNA Sequence String Validation
Before applying the genomics Machine Learning models, the DNA sequence string needs to be validated for: empty or None values, valid DNA definition and sequence string length. Below is the code of the dna_sequence_validation(dna_sequence) function.
def dna_sequence_validation(dna_sequence):
"""use the following functions to validate a DNA sequence: is_string_empty_none(dna_sequence),
is_dna(dna_sequence) and len(dna_sequence)
Args:
dna_sequence: DNA sequence string
Returns:
true or false
"""
is_valid = False
try:
if PyDNA.is_string_empty_none(str(dna_sequence)):
validation_message = "The DNA sequence is empty or it has None value. Enter a valid DNA sequence"
elif not PyDNA.is_dna(dna_sequence):
validation_message = "The DNA sequence is not a valid DNA one."
elif len(dna_sequence) != config.DNA_SEQUENCE_LENGTH:
validation_message = "The DNA sequence length is not equal to 50 nucleotides long."
else:
is_valid = True
validation_message = None
except Exception as e:
validation_message = "An error occurred: {} in function {}".format(e, dna_sequence_validation.__name__)
return is_valid, validation_message
As you can see the PyDNA genomics library developed in my blog paper “Apply Machine Learning Algorithms for Genomics Data Classification” has been used. Let’s look at the following three examples of DNA sequence string validations.
1. dna_sequence_json = {“dnasequence”:””}
“The DNA binary class result: The DNA sequence is empty or it has None value. Enter a valid DNA sequence”
2. dna_sequence_json = {“dnasequence”:”ACTCGCTGTCCACGTCTATTCCTAGGGGTTTTATTTCGCAAGGTGATACTFFF”}
“The DNA binary class result: The DNA sequence is not a valid DNA one”
3. dna_sequence_json = {“dnasequence”:”ACTCGCTGTCCACGTCTATTCCTAGGGGTTTTATTTCGCAAGGTGATACTA”}
“The DNA binary class result: The DNA sequence length is not equal to 50 nucleotides long”
4. Applying Long Short-Term Memory Networks (LSTM) for DNA Sequence Classification
In my blog paper “Apply Machine Learning Algorithms for Genomics Data Classification” a Long Short-Term Memory Networks (LSTM) model was designed and developed providing the best metric results shown below.
Model Validationvalid accuracy score:
98.5valid precision:
98.545valid recall:
98.5valid f1 score:
98.5valid confusion matrix:
[[99 3]
[ 0 98]]valid classification report:
precision recall f1-score support 0 1.00 0.97 0.99 102
1 0.97 1.00 0.98 98accuracy 0.98 200
macro avg 0.99 0.99 0.98 200
weighted avg 0.99 0.98 0.99 200Model Testtest accuracy score:
99.5test precision:
99.505test recall:
99.5test f1 score:
99.5test confusion matrix:
[[100 1]
[ 0 99]]test classification report:
precision recall f1-score support 0 1.00 0.99 1.00 101
1 0.99 1.00 0.99 99accuracy 0.99 200
macro avg 0.99 1.00 0.99 200
weighted avg 1.00 0.99 1.00 200
As you can see with this LSTM networks the accuracy score obtained is 99.5% — which is an excellent result. I did although run a couple more genomic datasets with LSTM algorithms. LSTM provides the best results for DNA sequence classification so far. The LSTM genomics model was saved as “lstm_genomic_model.h5” for application deployment.
5. Single DNA Sequence Binary Classification Using Web UI
A simple way to consume genomics RESTful APIs is by calling them using any server-site programming language. For example, the following four lines of Python code shown below can classify a single DNA sequence by posting a JSON string dna_sequence_json.
api_url = config.DNA_CLASS_API_URL_STRING
dna_sequence_json = {"dnasequence":"ACTCGCTGTCCACGTCTATTCCTAGGGGTTTTATTTCGCAAGGTGATACT"}
dna_class = requests.post(url=api_url, json=dna_sequence_json)
print("The DNA binary class result: {}".format(dna_class.text))Result: The DNA binary class result: 0
There is no doubt that it’s simple, yet Python programming code is still required. A better deployment approach would be to have a User Interface (UI) where anyone can use it without the need to write any code at all. Special, any company management personal would especially enjoy having this UI implementation done. The most common of Machine Learning models deployment today are Internet Web applications. I have seen some companies using client-server networks applications to deploy their Machine Learning models too.
Based on the developed genomics RESTful APIs, two web pages have been implemented using the following endpoint URLs:
- http://localhost:5500/dnasequence — for a single DNA sequence classification
- http://localhost:5500/dnafile — for multiple DNA sequences classification with a CSV file
The default Windows 10 localhost (http://127.0.0.1) is used with the opened 5500 port. Let’s look at these endpoint URLs. Open Chrome and run http://localhost:5500/dnasequence. The deep_learning_genomics_primer_webpage.html page will appear.
Enter a DNA sequence string that you would like to classify to.
Click on ‘Classify’ button to get the predicted DNA binary class. In this case the DNA class is 0.
This genomics web applications can be deployed in any web server and used by company end-users. The only thing that is required is to enter the DNA sequence string and click on the ‘Classify’ button. The same result can be obtained by using the Google Postman site as shown in the figure below.
6. Multiple DNA Sequences Binary Classification Using a CSV File
In some cases, we need to classify multiple DNA sequences. In general, a CSV file will be provided with DNA sequence rows as shown below, for example.
Open Chrome and run the endpoint http://localhost:5500/dnafile. The deep_learning_genomics_primer_webpage_browse.html page will appear.
Click on ‘Choose File’ button to select a CSV file. The dna_sequence_protein_ten_tests.csv file was chosen with ten unclassified DNA sequences rows as shown above.
Click on ‘Classify’ button. The page will show ten classified DNA sequences.
7. Conclusions
1. Machine Learning models’ deployment and maintenance in production environment is a required final phase for data science companies’ projects.
2. Flask web microframework provides a simple way of design and develop genomics RESTful Web APIs and UI for end-users.
3. Developed genomics Web APIs allow binary classification of single and multiple DNA sequence strings.
In the future, I’ll be covering how to design and develop genomics Web APIs using Google gRPC microservices.