In this lab, you will learn how to create a virtual environment and install deep learning frameworks, such as TensorFlow, PyTorch, and Mindspore. You will also learn how to install Jupyter Notebook and JupyterLab. You will gain access to Github Student Developer Pack, which provides a lot of free tools for students, such as DataSpell and PyCharm. Finally, you will also gain access to our department’s Huawei Ascend 910 AI Server which supported by Mindspore natively.
Python Virtual Environment is a tool to keep the dependencies required by different projects in separate places, by creating virtual Python environments for them. It solves the “Project X depends on version 1.x but, Project Y needs 4.x” dilemma, and keeps your global site-packages directory clean and manageable. There are several tools to create virtual environments in Python. In this lab, we will use venv and conda.
conda is a package manager, environment manager, and Python distribution that contains a lot of scientific packages. It is used to create virtual environments for Python containing both Python and non-Python packages. A conda environment is a directory that contains a specific collection of conda packages that you have installed even if you have multiple versions of Python on your computer, you can specify which Python installation each environment uses. For example, you may have one environment with NumPy 1.7 and its dependencies, and another environment with NumPy 1.6 for legacy testing. If you change one environment, your other environments are not affected. You can easily activate or deactivate environment, which is how you switch between them. You can also share your environment with someone by giving them a copy of your environment.yaml file.
venv is a built-in module in Python 3 for creating virtual environments. It was added to Python in version 3.3. It’s intended to be the successor of the older virtualenv tool. virtualenv is a third party alternative (and predecessor) to venv. It allows you to create and manage separate environments with different versions of Python and/or packages installed in them.
Deep Learning Frameworks are software libraries that are designed to make building deep learning models fast, easy, and efficient. They provide a clear and concise way for defining models using a collection of pre-built and optimized components. Deep learning frameworks are often optimized for distributed training across multiple GPUs and machines. They also provide a suite of visualization tools to make it easier to understand, debug, and optimize your models.
TensorFlow is an open-source software library for machine learning across a range of tasks, and developed by Google to meet their needs for systems capable of building and training neural networks to detect and decipher patterns and correlations, analogous to the learning and reasoning which humans use. It is currently one of the most popular deep learning frameworks.
PyTorch is an open-source machine learning library based on the Torch library, used for applications such as computer vision and natural language processing, primarily developed by Facebook’s AI Research lab (FAIR). It is free and open-source software released under the Modified BSD license. Although the Python interface is more polished and the primary focus of development, PyTorch also has a C++ interface.
Mindspore is a deep learning framework developed by Huawei. It is designed to provide development experience with friendly design and efficient execution for deep learning algorithms, covering model architecture search, training and inference. Mindspore is adaptable to all scenarios that require AI technologies.
Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more.
JupyterLab is the next-generation web-based user interface for Project Jupyter. JupyterLab enables you to work with documents and activities such as Jupyter notebooks, text editors, terminals, and custom components in a flexible, integrated, and extensible manner. You can arrange multiple documents and activities side by side in the work area using tabs and splitters. Documents and activities integrate with each other, enabling new workflows for interactive computing.
conda
command is used to create, manage, and activate virtual environments. To create a virtual environment, run the following command:
conda create -n py39 python=3.9.16
The above command creates a virtual environment named py39
with Python version 3.9.16. To activate the virtual environment, run the following command:
conda activate py39
After that, you can install PyTorch using the following command:
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
python -m pip install -U tensorflow
You can install Jupyter and other packages using the following command:
conda install jupyter matplotlib numpy pandas
python -m pip install -U diffusers==0.12.1 transformers==4.25.1 accelerate
You can verify Tensorflow installation by running the following command:
python -c "import tensorflow as tf;print(tf.reduce_sum(tf.random.normal([1000, 1000])))"
DataSpell is a powerful IDE for data science and machine learning. It is based on PyCharm, which is one of the most popular Python IDEs. DataSpell provides a more powerful alternative to JupyterLab. It supports Jupyter Notebook, JupyterLab, and other popular data science tools. It also provides a lot of powerful features, such as code completion, code refactoring, version control, and so on.
Let’s apply for Github Student Developer Pack first. You can apply for it here. After you get the pack, you can activate DataSpell with the license provided by Github Student Developer Pack.
After that you could request a free subscription for DataSpell here. You will receive an email with a link to activate your subscription. After you activate your subscription, you will receive an email with a license key. You can activate DataSpell with the license key.
In DataSpell, you can configure the Python interpreter with existing conda environment. You can do this by clicking the Python interpreter in the status bar.
Please select the existing conda environment you created before.
In DataSpell, you can attach a directory to the project. You can do this by clicking the Attach Directory
button in the status bar.
Please select the directory you want to attach.
In DataSpell, you can create a new Jupyter Notebook by clicking the +
button in the status bar.
In DataSpell, you can run a Jupyter Notebook by clicking the Run
button in the status bar.
Previously, we have installed PyTorch. Now we can verify the installation by running Stable Diffusion Model.
from diffusers import StableDiffusionPipeline
import torch
model_id = "runwayml/stable-diffusion-v1-5"
pipe = StableDiffusionPipeline.from_pretrained(model_id, revision='fp16', torch_dtype=torch.bfloat16)
pipe = pipe.to("cuda")
pipe.safety_checker = None
pipe.enable_attention_slicing()
pipe.enable_sequential_cpu_offload()
prompt = "a photo of an astronaut riding a horse on mars"
image = pipe(prompt, num_inference_steps=20).images[0]
image.save("astronaut_rides_horse.png")
Our department has a Huawei Ascend Server with mindspore
support. However, it does not support Tensorflow. In this section, we will show you how to set up venv for Tensorflow on Huawei Ascend Server. This server is only available for students in our department and will be used to run all lab experiments in this course.
You can connect to Huawei Ascend Server with SSH. You can do this by running the following command in your terminal:
ssh your_department_login@172.27.244.41
. /usr/local/Ascend/ascend-toolkit/set_env.sh
export HTTP_PROXY=http://proxy.hkbu.edu.hk:8080
export HTTPS_PROXY=http://proxy.hkbu.edu.hk:8080
/usr/local/python3.7.5/bin/python3 -m pip install -U pip
/usr/local/python3.7.5/bin/python3 -m pip install -U tensorflow
/usr/local/python3.7.5/bin/python3 -m pip install -U jupyter matplotlib numpy pandas
venv
module provides support for creating lightweight “virtual environments” with their own site directories, optionally isolated from system site directories. Each virtual environment has its own Python binary (allowing creation of environments with various Python versions) and can have its own independent set of installed Python packages in its site directories.
To create a virtual environment, run the following command:
/usr/local/python3.7.5/bin/python3 -m venv ~/py37
It will create a virtual environment named py37
with Python version 3.7.5 in home directory. To activate the virtual environment, run the following command:
. /usr/local/Ascend/ascend-toolkit/set_env.sh
. ~/py37/bin/activate
export HTTP_PROXY=http://proxy.hkbu.edu.hk:8080
export HTTPS_PROXY=http://proxy.hkbu.edu.hk:8080
It’s recommended to upgrade pip before installing any packages in the virtual environment. To do this, you can upgrade pip using the following command:
pip install -U pip
After that, you can install Tensorflow using the following command:
pip install -U tensorflow
After that, you can install Jupyter Notebook using the following command:
pip install -U jupyter matplotlib numpy pandas
tmux is a terminal multiplexer. It lets you switch easily between several programs in one terminal, detach them (they keep running in the background) and reattach them to a different terminal. To start tmux, run the following command:
tmux
To start Jupyter Notebook Server, run the following command:
jupyter notebook --no-browser
Please note the port number of the Jupyter Server running on.
Press Ctrl+b
then d
to detach from tmux
. After that, we will need to disconnect from server by exit
command.
exit
SSH port forwarding allow us to access the Jupyter Server running on the server from local. To do this, run the following command:
ssh -L 8888:localhost:8888 your_department_login@172.27.244.41
Please replace 8888
with the port number that your Jupyter Server running on.
tmux attach
After that we may connect to the Jupyter Server running on the server from local. To do this, open your browser and go to http://localhost:8888
. You should see the Jupyter Notebook running on the server. You can create a new notebook and run the following code to verify the installation:
import tensorflow as tf
tf.config.list_physical_devices()
You may also connect the notebook server in DataSpell. To do this, open DataSpell and click Connect to Server
button in the status bar and enter the URL with token.
After that we may connect to the Jupyter Server running on the server from local. To do this, open your browser and go to http://localhost:8888
. You should see the Jupyter Notebook running on the server. You can create a new notebook and run the following code to verify the installation:
import numpy as np
import mindspore.context as context
import mindspore.nn as nn
from mindspore import Tensor
from mindspore.ops import operations as P
context.set_context(mode=context.GRAPH_MODE, device_target="CPU")
class Mul(nn.Cell):
def __init__(self):
super(Mul, self).__init__()
self.mul = P.Mul()
def construct(self, x, y):
return self.mul(x, y)
x = Tensor(np.array([1.0, 2.0, 3.0]).astype(np.float32))
y = Tensor(np.array([4.0, 5.0, 6.0]).astype(np.float32))
mul = Mul()
print(mul(x, y))
To run on Ascend AI processor, you can change the device target to Ascend
:
context.set_context(mode=context.GRAPH_MODE, device_target="Ascend")
updating python interpreter...
In Menu Help
-> Diagnostics Tool
-> Debug Log Setting
, add #com.jetbrains.python.sdk.PythonSdkUpdater$Trigger
and restart DataSpell.